No, this should not happen.
Double check that your messages match properly (i.e., they're going on
the same communicator to the intended rank with the right tag). A
common cause for this is an unintentional mismatch (e.g., tag is X on
the sender and Y on the receiver).
One way to help identify where this happens is to [perhaps temporarily]
change all causes of MPI_Send to MPI_Ssend (synchronous send, meaning
that it won't complete until a matching receive is issued) in your
application. That way, the send will block until a matching receive has
occurred.
Also, if your code is absolutely posting matching messages, try running
your application through a memory-checking debugger (such as valgrind --
see the LAM FAQ for important information about this) to see if memory
corruption is occurring somewhere, and therefore you're simply getting
run-of-the-mill nondeterministic behavior.
Hope that helps.
Craig Lam wrote:
> Hey Everyone,
>
> I'm running LAM/MPI on a 16 node cluster over TCP on two interconnects,
> Gigabit Ethernet and Infiniband. My application appears to be sending
> messages that are never recieved on the destination side. Is this
> possible? has anyone seen similar results? This sounds crazy, I know,
> but I've set up some pretty fool-proof tests to count the number of sent
> and recieved messages, and the numbers contradict. Does anyone have any
> ideas for why this might be?
>
> Thanks,
> Craig
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|