LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2001-02-17 20:08:11


On Fri, 16 Feb 2001, Bill McKie wrote:

> While exploring a comparison of lam/mpi mpirun options -c2c vs -lamd
> with one of our production programs, I noticed that the following
> error exit would sometimes occur when using -lamd mode:
>
> MPI_Isend: internal MPI error: Unknown error 464 (rank 0, MPI_COMM_WORLD)
> Rank 0: Call stack within LAM:
> Rank 0: - MPI_Isend()
> Rank 0: - main()
> 13716 (n0) exited with status 464

This is actually a known (and intended) feature of LAM. It is called GER
-- Guanarteed Envelope Resources. See the following paper about GER -- it
explains what an MPI implementation has to go through to deliver messages
reliably:

        http://www.lam-mpi.org/lam/papers/delivery.paper/

By default, mpirun enables GER for -lamd mode, and disables it for -c2c
mode. The only real reason for this is histerical raisins. You can
disable GER for -lamd with the -nger option:

        mpirun -lamd -nger N a.out

Be aware, however, that if you disable GER mode, it is possible to cause
serialization and/or deadlock in your MPI application. In the lamd mode,
a pseudo-daemon in the lamd is used for buffer allocation of incoming
messages (the bufferd). If the bufferd on n0 runs out of buffers, it will
wait until some buffers become free before allowing any new messages to be
accepted. That is -- *all* new messages sent to n0 will be blocked until
the rank(s) on n0 receive the messages that are pending in the lamd.

This doesn't typically happen because most codes don't allow large numbers
of messages to go unreceived, but it is possible (even in non-blocking
mode!).

This can also happen in c2c mode, but the consequences are less drastic --
only the one socket gets blocked. i.e., the OS takes care of most of the
buffering here, since both the buffer endpoints and the network itself can
"store" messages that have not yet been received. But in that case, it
only blocks a single socket between two ranks, not the entire node where
that rank resides.

The most obvious sign of this is when a user asks why MPI_Send() blocks
(recall that the MPI spec says that MPI_SEND may or may not block). In
LAM/MPI, this can be due to the fact (in c2c mode) that the receiver has
let too many messages go unreceived and all the buffering space is full;
the sender will be blocked until the receiver actually receives some of
the pending messages.

Another reason MPI_Send() can block is when you send a long message and
the receiver has not posted a receive yet, but that's been discussed on
this list many times already -- I mention it here only for completeness.

Again, this typically doesn't happen unless you have a [very] poorly
behaved parallel application, but I thought I'd mention it.

-----

Multiple people have asked about this before; it is possible (likely?)
that we will change the default behavior of "mpirun -lamd" in future
versions of LAM/MPI.

{+} Jeff Squyres
{+} squyres_at_[hidden]
{+} Perpetual Obsessive Notre Dame Student Craving Utter Madness
{+} "I came to ND for 4 years and ended up staying for a decade"

_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/