LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2001-12-13 20:03:24


On Thu, 13 Dec 2001, Byna Surendra wrote:

> * What is the default amount of buffer allocated by a system for MPI
> application?

It depends on what you mean here. Keep in mind that these answers are
specific to LAM -- other implementations may handle buffer management
differently.

Let's take the simple case -- TCP communication. In this case, LAM sets
the socket buffering size to be the large message size (defaults to 64k)
for each other MPI process. Hence, it will be
((sizeof(MPI_COMM_WORLD)-1)*64k). This is an option on the socket itself,
so the buffering memory will be in the kernel. This buffering is only for
expedience -- it ensures that small messages can be written directly to
the kernel without blocking.

Other communication devices, like shared memory, will allocate a common
block of memory that is shared between multiple processes. User messages
are written there by the sender and then read by the receiver.

Another device, myrinet, needs to have fixed "special" memory allocated
for communication. I forget what the defaults are offhand.

Other than that, it's whatever the user malloc's for message management --
message memory management (for the most part) is the user's responsibilty.

> * What happens when a send message size is more than the buffer size
> allocated by the system?

I'm not sure what you mean here. If you try to receive a large message in
a small buffer, you'll likely cause a system error like a segmentation
fault, or the like.

Remember: the user has to allocate memory for the sending buffer and the
receiving buffer. So if you don't allocate enough memory in the receiving
buffer, your process will likely crash and burn.

> * where does this memory reside?

Socket buffering is in the kernel. Shared memory is directly accessable
between multiple processes. "Special" myrinet memory is real physical
memory that has been pinned by the kernel, so it's in the process space as
well. You can actually run out if you're not careful (e.g., have lots of
outstanding sends/recvs such that there's no more memory to pin).

> * how do MPI_send and MPI_recv use this buffer?

It's different for each device.

TCP: MPI_Send writes the user buffer down the socket. The kernel may or
may not buffer it before sending it out on the network. MPI_Recv reads
from a socket, which may or may not have been buffered by the kernel.

Shmem: Described above -- MPI_Send writes to the shmem, MPI_Recv reads
from the shmem.

Myrinet: If the message is tiny, it is copied to a pre-pinned buffer and
sent from there. If the message is short, it is copied to a longer
pre-pinned buffer and send from there (tiny messages are sent in one
Myrinet message, while short messages are sent as two Myrinet messages).
If the message is long, the user's buffer is pinned and the message is
sent from it's original location (if the OS supports it -- Solaris does
not. In Solaris, we have to pin a new buffer and copy the message
before sending). For MPI_Recv, it's pretty much the same.

> * is there any difference between LAM implementation and MPICH
> implementation of buffer management?

Most likely. You'll have to ask the MPICH folks.

There's also a whole buffer management scheme for message envelopes -- the
meta data that is sent with each MPI message. Generally speaking, there
are pre-allocated buffers for these (and LAM allocates more if it needs
them).

{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/

_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/