This is an extension of a problem I've mentioned on the list before.
Basically, LAM gets into a state where it's unwilling to run certain
executables. It fails immediatly and gives me an error message like
this:
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
My current guess is that this message indicates that a previously run
execution terminated uncleanly. What's weird is that this message
comes up when I try to mpirun some executables but not others. Often
when this happens, I get a condition like the one I mentioned before,
where there is a dead process hanging out on one of the nodes, and
sometimes, after I kill said process, the problem is that the
executable on that machine may be different from all the other
machines. Recently, however, I've seen this problem without any of
these other issues.
So, can someone explain to me when this message comes up and what it means?
y
|