On Oct 28, 2003, at 7:06 AM, chellapp_at_[hidden] wrote:
> When I put MPI_scatter and gather in the loop, I get error
> message
> like this
> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD).
> I am wondering how to rectify this in mpi.I will highly
> appreciate
> your help and any suggestions in this regard.
This error message is a bit cryptic, but fairly self explanatory. One
of the processes LAM thought should be running and communicating
wasn't. This usually means one of a couple of things:
* You have provided the wrong parameters to either scatter or gather
(or both)
* You passed an incorrect buffer to one of the processes, resulting
in a seg fault
* Your processes is randomly dying at some other point for some
unknown reason
Based on your pseudo-code, it is basically impossible to tell where
your problem is. If you can run your application on a small number of
nodes (say, 2-4), you may want to try running your application under
gdb - it might help you see which process dies first, which can be
really helpful in these kinds of situations. Try looking at your
buffer usage - scatter/gather have some interesting buffer requirements
- are you sure you have it right?
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|