On Dec 6, 2004, at 9:31 AM, Atle Svandal wrote:
> Thanks for the help. I finally resolved the problem by using
> MPI_Sendrecv(...).
Excellent. That's one of the main rationales behind MPI_SENDRECV --
MPI can make progress on both actions simultaneously, even if one of
them blocks.
> I still don't understand why it sometimes stalls and sometimes don't,
> but I
> bet I got a much more stable code now.
The issue is that MPI is *allowed* to block in MPI_SEND until a
matching receive is posted, but is not required to. In most MPI
implementations there are typically different protocols for sending
messages. If the message is under a certain size, it is sent eagerly
and there is usually no delay (i.e., it doesn't block). Over that
size, messages may be sent with a rendezvous protocol, meaning that the
message won't actually be sent until a matching receive is posted.
Hence, the MPI_SEND will block until a matching receive is posted.
The use of "small" and "large" message protocols is fairly common --
the large message rendezvous protocol is used to prevent resource
exhaustion at the receiver. Note that the size cutoff between "small"
and "large" is not only different between different MPI
implementations, but is also likely to be different between different
devices / network types (e.g., it's different between the different LAM
RPI modules). See the LAM/MPI User Guide (in the MPI SSI chapter) for
the SSI parameters that can change the cutoff sizes between small and
large for each RPI module.
I'm waving my hands a bit here and avoiding discussing some of the
details, but you get the general idea.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|