Hi,
I'm in process of upgrading from version 6.3 to 7.1.
I've got lam daemons running on my master and slave machines. Then I'm
executing mpirun with application schema and is getting
MPI_Init: LAM error: Unknown error 471
------------------------------------------------------------------------
-----
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
------------------------------------------------------------------------
-----
////////////////////////////////////////////////////////////////////////
///////////////
My master and slave processes does call MPI_Init. I think the error
message for 471 is coming out of slave processes and therefore is
quitting before my master processes gets a chance to call MPI_Init,
which generates the message about not invoking MPI_INIT before quitting.
This part of the code works fine with version 6.3. Are there some
changes between release that I'm not aware of?
I've seem some conflicting documentation saying MPI_Init needs to be
called by all processes and then another help file saying the master or
one of slave machine needs to call MPI_Init. In either case, what is
the Unknow error 471 and which LAM/MPI source code is this coming out
of?
Here is my command:
mpirun -t -c2c -O -w -x $LAM_EXPORT myapp
where LAM_EXPORT=PATH,LD_LIBRARY_PATH,DISPLAY,LAMHOME
myapp file contains:
n0 /afs/tda/sti/r33/prod/linux24_64/tools/tb/bin-64/TWTgen
parallelprocess=yes experiment=ya lbist=yes
n1 /afs/tda/sti/r33/prod/linux24_64/tools/tb/bin-64/TWTgenfm
experiment=ya lbist=yes parallelprocess=yes
I'm running TWTgen on master and TWTgenfm on slave. They are
the same programs with different calling entry.
|