On Mar 20, 2005, at 12:12 AM, Kumar, Ravi Ranjan wrote:
> I wrote a code in C++ using MPI. It works fine and gives correct
> result for
> smaller 3D array size case for e.g. T[51][51][51]. However, my code
> hangs when
> I try to run the same for larger size case i.e T[101][101][101] with
> an error
> message as below:
>
> MPI_Recv: message truncated (rank 0, MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - main()
<snip>
> I read sometime ago that this may be due to mismatch in number of data
> sent and
> number of data received in MPI_Send/MPI_Recv process. I have checked
> this thing
> many times and found no mismatch in number of data exchanged, still I
> am
> getting this error. What can be the reason for this? Could anyone
> please
> explain?
The reason is exactly as you surmised. For some reason, a message has
arrived that is bigger than the buffer you posted. It's hard to tell
why this is occurring, but I would look carefully at your send/recv
pairs again. These are hard ones to debug, as LAM is in an error
condition and doesn't give you much information about what happened. I
notice you are using blocking receives - this helps a little bit, in
that you can print out what messages are being printed (and their
sizes) and you can print out the size of the buffer you are providing
to MPI_Recv. If you send a big message and post an ANY_SOURCE recv,
Murphy's law pretty much guarantees it will happen in the worst order
possible.
Hope that helps,
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have an LAM/MPI day: http://www.lam-mpi.org/
|