LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-12-09 18:40:11


On Dec 9, 2005, at 10:34 AM, Dilani Perera wrote:

> In the program root processor sends data to other processes, when the
> calculation is done they send back the results to the root
> processor back.
> To send and receive I am using MPI_Send command and MPI_Recv command.
> This sending and receiving process take place until a certailn
> condition
> is satisfied.
>
> Program works fine but it seems that program terminate due to some
> failure.
>
> In side the program there are several places for memory allocation.
> but
> the memory deallocation also done after that. When size of the
> array is
> larger seems like program not working at all.

You might want to run your program through a memory-checking debugger
such as Valgrind (I think they just recently released a new
version). See the LAM FAQ for information on this.

> % mpicc -lm -o out Main.c
> % mpirun -v -np 2 out A1500.txt
> 2822 out running on n0 (o)
> 22237 out running on n1
>
> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> Rank (1, MPI_COMM_WORLD): - main()
> ----------------------------------------------------------------------
> ----
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error. If
> your process did not finish in error, be sure to include a "return
> 0" or
> "exit(0)" in your C code before exiting the application.
>
> PID 9940 failed on node n0 (134.153.50.235) due to signal 11.
> ----------------------------------------------------------------------
> ----

What this is telling you is that one of your processes died with a
signal eleven, meaning that it seg faulted. This probably indicates
a memory error of some kind.

Try using the memory-checking debuggers, or see if you can examine
any corefiles that were produced due to the seg fault.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/