-------------------------------------------------------------
Amit
Rudra
School of Information Systems
Curtin University of
Technology
Perth,
Australia
------------------------------------------------------------->>>
jsquyres@lam-mpi.org 14/12/2001 9:03:24 am >>>
On Thu, 13 Dec 2001,
Byna Surendra wrote:
> * What is the default amount of buffer
allocated by a system for MPI
> application?
It depends on what you
mean here. Keep in mind that these answers are
specific to LAM -- other
implementations may handle buffer management
differently.
Let's take
the simple case -- TCP communication. In this case, LAM sets
the socket
buffering size to be the large message size (defaults to 64k)
for each
other MPI process. Hence, it will
be
((sizeof(MPI_COMM_WORLD)-1)*64k). This is an option on the socket
itself,
so the buffering memory will be in the kernel. This buffering
is only for
expedience -- it ensures that small messages can be written
directly to
the kernel without blocking.
Other communication devices,
like shared memory, will allocate a common
block of memory that is shared
between multiple processes. User messages
are written there by the
sender and then read by the receiver.
Another device, myrinet, needs to
have fixed "special" memory allocated
for communication. I forget what
the defaults are offhand.
Other than that, it's whatever the user
malloc's for message management --
message memory management (for the most
part) is the user's responsibilty.
> * What happens when a send
message size is more than the buffer size
> allocated by the
system?
I'm not sure what you mean here. If you try to receive a
large message in
a small buffer, you'll likely cause a system error like a
segmentation
fault, or the like.
Remember: the user has to allocate
memory for the sending buffer and the
receiving buffer. So if you
don't allocate enough memory in the receiving
buffer, your process will
likely crash and burn.
> * where does this memory reside?
Socket buffering is in the kernel. Shared memory is directly
accessable
between multiple processes. "Special" myrinet memory is
real physical
memory that has been pinned by the kernel, so it's in the
process space as
well. You can actually run out if you're not careful
(e.g., have lots of
outstanding sends/recvs such that there's no more memory
to pin).
> * how do MPI_send and MPI_recv use this buffer?
It's
different for each device.
TCP: MPI_Send writes the user buffer down the
socket. The kernel may or
may not buffer it before sending it out on
the network. MPI_Recv reads
from a socket, which may or may not have
been buffered by the kernel.
Shmem: Described above -- MPI_Send writes to
the shmem, MPI_Recv reads
from the shmem.
Myrinet: If the message is
tiny, it is copied to a pre-pinned buffer and
sent from there. If the
message is short, it is copied to a longer
pre-pinned buffer and send from
there (tiny messages are sent in one
Myrinet message, while short messages
are sent as two Myrinet messages).
If the message is long, the user's
buffer is pinned and the message is
sent from it's original location (if the
OS supports it -- Solaris does
not. In Solaris, we have to pin a new
buffer and copy the message
before sending). For MPI_Recv, it's pretty
much the same.
> * is there any difference between LAM implementation
and MPICH
> implementation of buffer management?
Most likely.
You'll have to ask the MPICH folks.
There's also a whole buffer
management scheme for message envelopes -- the
meta data that is sent with
each MPI message. Generally speaking, there
are pre-allocated buffers
for these (and LAM allocates more if it needs
them).
{+} Jeff
Squyres
{+} jsquyres@lam-mpi.org
{+}
http://www.lam-mpi.org/_______________________________________________
This
list is archived at
http://www.lam-mpi.org/MailArchives/lam/