It means that the LAM communication system failed to initialize;
perhaps due to shared memory issues...?
If you're just starting with MPI, I suggest that you go with Open MPI
instead of LAM/MPI -- LAM is fairly static and not evolving anymore.
We're concentrating all of our effort on Open MPI these days (www.open-mpi.org
).
On Nov 15, 2007, at 7:37 AM, Sara Campos wrote:
> Hello,
>
> We are LAM/MPI beginners who are using parallelization to run
> molecular simulation programs.
> We have observed in some machines the following error (which seems
> to be solved when we reboot the machine):
>
> The selected RPI failed to initialize during MPI_INIT. This is a
> fatal error; I must abort.
>
> This occurred on host model24.itqb.unl.pt (n0).
> The PID of failed process was 30686 (MPI_COMM_WORLD rank: 0)
> -----------------------------------------------------------------------------
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 30687 failed on node n0 (193.136.181.161) with exit status 1.
>
> This error is serious to us because it kills all our parallel
> jobs that are directed to the problematic machine by the queuing
> system.
> The lam was simply installed by rpm in all machines with no
> further configuration. The commands we use are lamboot <machines>,
> mpirun C <executable> and lamhalt.
> We tried to search the manual but it is a bit too advanced for
> us. Can you explain us what the problem is and how can it be solved?
>
> Thanks in advance
>
> Sara Campos
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
|