This means that one or more of the processes which you started quit before
reaching MPI_INIT. There are many reasons due to which this might happen.
The most common reasons are:
1. Running different versions on LAM on different machines
2. Compiling and linking different exdcutables with different versions of
LAM
3. Leftover badness from previous application runs. Use "lamclean" to get
rid of this.
4. Ensure that there does not exist an application with the same name as
yours which is found before your application in $PATH variable. In other
words, ensure that the right application is executed.
Hope this helps,
Anju
On Mon, 21 Jun 2004, Yaron Minsky wrote:
> This is an extension of a problem I've mentioned on the list before.
> Basically, LAM gets into a state where it's unwilling to run certain
> executables. It fails immediatly and gives me an error message like
> this:
>
> -----------------------------------------------------------------------------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> -----------------------------------------------------------------------------
>
> My current guess is that this message indicates that a previously run
> execution terminated uncleanly. What's weird is that this message
> comes up when I try to mpirun some executables but not others. Often
> when this happens, I get a condition like the one I mentioned before,
> where there is a dead process hanging out on one of the nodes, and
> sometimes, after I kill said process, the problem is that the
> executable on that machine may be different from all the other
> machines. Recently, however, I've seen this problem without any of
> these other issues.
>
> So, can someone explain to me when this message comes up and what it means?
>
> y
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|