LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Zhenxia Zhang (majorzzx_at_[hidden])
Date: 2006-04-17 10:15:39


I execute as following, and faced the problem that lamrestart will fail after the execution file renamed.

First, it works well.
------------------------------------------------------------------------------------
$ mpirun -ssi cr blcr C ./hello
$ lamcheckpoint -ssi cr blcr -pid 1280
$ lamrestart -ssi cr blcr -ssi cr_blcr_context_file context.mpirun.1280
------------------------------------------------------------------------------------

Then I rename the file, lamrestart failed.
---------------------------------------------------------------------------------------
$ mv hello hello2
$ lamrestart -ssi cr blcr -ssi cr_blcr_context_file context.mpirun.1280
cri_syscall(CR_OP_RSTRT_REAP): Invalid argument
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------

I rename back the file, the problem is still there.
--------------------------------------------------------------------------------------
$ mv hello2 hello
$ lamrestart -ssi cr blcr -ssi cr_blcr_context_file context.mpirun.1280
cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy
cri_syscall(CR_OP_RSTRT_REQ, &req): Device or resource busy
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------

Could you tell me what I have done wrong? Thanks.

Sincerely,
Zhenxia Zhang