On May 27, 2008, at 3:51 PM, debejyo chakraborty wrote:
> I am getting the following error, though everything seems to be fine
> in the code. Am I missing something? Below are examples of the
> errors I'm seeing when my code crashes.
>
>
> ------------------------------
> -----------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 29885 failed on node n1 (129.219.39.162) due to signal 11.
> -----------------------------------------------------------------------------
>
> MPI_Recv: process in local group is dead (rank 34, MPI_COMM_WORLD)
> MPI_Recv: process in local group is dead (rank 16, MPI_COMM_WORLD)
>
> Rank (16, MPI_COMM_WORLD): Call stack within LAM:
> Rank (16, MPI_COMM_WORLD): - MPI_Recv()
> Rank (16, MPI_COMM_WORLD): - main()
One of your processes segfaulted, which usually indicates a memory
allocation or buffer overrun issue in the application. You might want
to use a memory checking debugger such as Valgrind to pin down the
problem. Have a look at the debugging section of the LAM/MPI FAQ for
more information.
http://www.lam-mpi.org/faq/category6.php3
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|