On May 7, 2008, at 8:36 PM, richard pan wrote:
> MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
> --------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
I believe this error message says it all -- one of your processes has
died.
Specifically: MPI_BARRIER isn't what caused your app to die;
MPI_BARRIER is the function that noticed that the other process was
dead, reported the problem, and then aborted all remaining MPI
processes.
Your program is a bit too long for me to debug; Brian's advice of
running through debuggers is probably your best bet. Also check for
corefiles that may indicate where your program died.
--
Jeff Squyres
Cisco Systems
|