LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Sara Campos (scampos_at_[hidden])
Date: 2007-11-15 10:37:25


Hello,

    We are LAM/MPI beginners who are using parallelization to run
molecular simulation programs.
We have observed in some machines the following error (which seems to be
solved when we reboot the machine):

The selected RPI failed to initialize during MPI_INIT. This is a
fatal error; I must abort.

This occurred on host model24.itqb.unl.pt (n0).
The PID of failed process was 30686 (MPI_COMM_WORLD rank: 0)
-----------------------------------------------------------------------------
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 30687 failed on node n0 (193.136.181.161) with exit status 1.

    This error is serious to us because it kills all our parallel jobs
that are directed to the problematic machine by the queuing system.
    The lam was simply installed by rpm in all machines with no further
configuration. The commands we use are lamboot <machines>, mpirun C
<executable> and lamhalt.
     We tried to search the manual but it is a bit too advanced for us.
Can you explain us what the problem is and how can it be solved?

Thanks in advance

Sara Campos