LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2003-10-28 11:27:06


On Oct 28, 2003, at 7:06 AM, chellapp_at_[hidden] wrote:

> When I put MPI_scatter and gather in the loop, I get error
> message
> like this
> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD).
> I am wondering how to rectify this in mpi.I will highly
> appreciate
> your help and any suggestions in this regard.

This error message is a bit cryptic, but fairly self explanatory. One
of the processes LAM thought should be running and communicating
wasn't. This usually means one of a couple of things:

   * You have provided the wrong parameters to either scatter or gather
(or both)
   * You passed an incorrect buffer to one of the processes, resulting
in a seg fault
   * Your processes is randomly dying at some other point for some
unknown reason

Based on your pseudo-code, it is basically impossible to tell where
your problem is. If you can run your application on a small number of
nodes (say, 2-4), you may want to try running your application under
gdb - it might help you see which process dies first, which can be
really helpful in these kinds of situations. Try looking at your
buffer usage - scatter/gather have some interesting buffer requirements
- are you sure you have it right?

Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/