LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2006-10-03 09:16:23


The BLCR error message:
cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy
typically indicates that the PID of the original process is currently
in use. This is normally caused by trying to restart a process while
it is still running.

In your message it seems like you are killing the mpirun process, not
the application processes. Therefore it is likely that the
application processes are still running on the machine(s) and causing
BLCR to fail. Try killing all of the application processes, running
lamclean, and/or sending a SIGTERM to mpirun.

If that doesn't help let me know.

-- Josh

On Sep 28, 2006, at 12:10 PM, Rui Ramos wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
>
> Hi there,
>
> I was integrating blcr with LAM/MPI and I'm facing some issues.
>
> when trying to restart a context file with lamrestart i get:
>
> "cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy"
>
> This is what i've done:
> - laminfo
> LAM/MPI: 7.1.2
> Prefix: /opt/lam
> Architecture: x86_64-unknown-linux-gnu
> Configured by: root
> Configured on: Thu Sep 28 15:23:39 WEST 2006
> Configure host: XXXXXXXXX
> Memory manager: ptmalloc2
> C bindings: yes
> C++ bindings: yes
> Fortran bindings: yes
> C compiler: gcc
> C++ compiler: g++
> Fortran compiler: g77
> Fortran symbols: double_underscore
> C profiling: yes
> C++ profiling: yes
> Fortran profiling: yes
> C++ exceptions: no
> Thread support: yes
> ROMIO support: yes
> IMPI support: no
> Debug support: no
> Purify clean: no
> SSI boot: globus (API v1.1, Module v0.6)
> SSI boot: rsh (API v1.1, Module v1.1)
> SSI boot: slurm (API v1.1, Module v1.0)
> SSI coll: lam_basic (API v1.1, Module v7.1)
> SSI coll: shmem (API v1.1, Module v1.0)
> SSI coll: smp (API v1.1, Module v1.2)
> SSI rpi: crtcp (API v1.1, Module v1.1)
> SSI rpi: lamd (API v1.0, Module v7.1)
> SSI rpi: sysv (API v1.0, Module v7.1)
> SSI rpi: tcp (API v1.0, Module v7.1)
> SSI rpi: usysv (API v1.0, Module v7.1)
> SSI cr: blcr (API v1.0, Module v1.1)
> SSI cr: self (API v1.0, Module v1.0)
>
> - I've tested blcr with single processes without lam and it work.
> - then i try a simple test program.
>
> #include <stdio.h>
> #include <mpi.h>
>
> main(int argc, char **argv)
> {
> int node;
>
> int i, j;
> float f;
>
> MPI_Init(&argc,&argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &node);
>
> printf("Hello World from Node %d.\n", node);
> for (j=0; j<=100000; j++)
> for(i=0; i<=100000; i++){
> f=i*2.718281828*i+i+i*3.141592654;
> printf("Interaction [i=%d,j=%d]=%f\n",i,j,f);
> }
> MPI_Finalize();
> }
>
> compiled it with:
>
> mpicc -o hello-lam mpihello.c -L/opt/lam/lib/ -I/opt/lam/include -
> lmpi
>
> executed with:
>
> mpirun -ssi cr blcr C hello-lam
>
> Created the context file with:
>
> lamcheckpoint -ssi cr blcr -pid 22286 -ssi cr_blcr_base_dir /tmp
>
> Kill the process:
>
> kill -9 22286
>
> Try the restart from context:
>
> lamrestart -ssi cr blcr -ssi cr_blcr_context_file /tmp/
> context.mpirun.22286
>
> which returns me:
>
> cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy
>
>
> Any ideas how i could solve this ???
>
> appreciate any help :)
>
>
> PS: blcr is version blcr-0.4.2
>
> - --
> Rui Ramos
> ==============================================
> Universidade do Porto - IRICUP
> Praça Gomes Teixeira, 4099-002 Porto, Portugal
> email: rramos[at]iric.up.pt
> ==============================================
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
>
> iQEVAwUBRRvz271uR0bdnTWSAQKZGQgAsdM+cOMdvT38ysVJdwO/+JN707Fwli07
> mNcbpb0UsKIaQbkycbL7B/o/zW5VC45j5Nmkbn0s/v48A+9MPIzQZQ7qE6azU2vG
> wL5mMo1bqcHEPusDgNXbLoK7HHVKcGTBBOR7UqXcRyfpUFx/ohJiNp/YutZlIkN/
> DXWHd4PW3EXVBgoKn0SUkgIJ8Rk7tE1D2TUTc+7TzqV6lXoxtA6sOoqyFEOZYzjc
> UZLDKFG9KF/r8EMIU7/Px0YxJn2kODHFzNq6VmoWmdT27QLfLjqUdvFrciSTNSyt
> i5xtfEbHWVaTOFr0QSFVz0mf3wG31Wgg2Wq43kRbbvjAfR4VsqT/uQ==
> =BIeF
> -----END PGP SIGNATURE-----
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

----
Josh Hursey
jjhursey_at_[hidden]
http://www.open-mpi.org/