LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Byna Surendra (suren_byna_at_[hidden])
Date: 2001-12-14 12:31:35


Hi Jeff,

Thank you very much for the information. While asking
about the default buffer size i was asking about the
default shared buffer allocated for each MPI process.
I know that in SGI MPI 3.0 implementation, a default
256KB shared buffer is allocated to every MPI Process
on its local node. This shared memory is visible to
all the processors in the system. (i mean Origin 2000
system). Based on your answers to my prvious set of
questions i think that it is 64KB in LAM (in both TCP
communication and shared memory communication). Please
correct me if I am wrong.

I just started to trace your code for lamsend.c. In
this fucntion, which call is exactly initializing the
communication? i found some _mpi_xxxx functions. Could
you please tell me where I can find the definition of
these functions? If possible could you explain, what
are the sequence of operations and calls occur in an
MPI_send and MPI_recv, in TCP communication and shared
mem?

My other question is, how are the processes initiated
on each processor when mpirun is executed? is there
one master process, which forks child processes in a
number, specified in -np option? when do those child
processes start running, is it when MPI_Init () is
called or before that?

I guess i have posted a lot of questions. Thank you
for your patience in advance.

regards,

Suren.
--- lam-request_at_[hidden] wrote:
> ATTACHMENT part 3.4 message/rfc822
> Date: Thu, 13 Dec 2001 20:03:24 -0500 (EST)
> From: Jeff Squyres <jsquyres_at_[hidden]>
> To: <lam_at_[hidden]>
> Subject: Re: LAM: MPI Buffer management
> Reply-to: lam_at_[hidden]
>
> On Thu, 13 Dec 2001, Byna Surendra wrote:
>
> > * What is the default amount of buffer allocated
> by a system for MPI
> > application?
>
> It depends on what you mean here. Keep in mind that
> these answers are
> specific to LAM -- other implementations may handle
> buffer management
> differently.
>
> Let's take the simple case -- TCP communication. In
> this case, LAM sets
> the socket buffering size to be the large message
> size (defaults to 64k)
> for each other MPI process. Hence, it will be
> ((sizeof(MPI_COMM_WORLD)-1)*64k). This is an option
> on the socket itself,
> so the buffering memory will be in the kernel. This
> buffering is only for
> expedience -- it ensures that small messages can be
> written directly to
> the kernel without blocking.
>
> Other communication devices, like shared memory,
> will allocate a common
> block of memory that is shared between multiple
> processes. User messages
> are written there by the sender and then read by the
> receiver.
>
> Another device, myrinet, needs to have fixed
> "special" memory allocated
> for communication. I forget what the defaults are
> offhand.
>
> Other than that, it's whatever the user malloc's for
> message management --
> message memory management (for the most part) is the
> user's responsibilty.
>
> > * What happens when a send message size is more
> than the buffer size
> > allocated by the system?
>
> I'm not sure what you mean here. If you try to
> receive a large message in
> a small buffer, you'll likely cause a system error
> like a segmentation
> fault, or the like.
>
> Remember: the user has to allocate memory for the
> sending buffer and the
> receiving buffer. So if you don't allocate enough
> memory in the receiving
> buffer, your process will likely crash and burn.
>
> > * where does this memory reside?
>
> Socket buffering is in the kernel. Shared memory is
> directly accessable
> between multiple processes. "Special" myrinet
> memory is real physical
> memory that has been pinned by the kernel, so it's
> in the process space as
> well. You can actually run out if you're not
> careful (e.g., have lots of
> outstanding sends/recvs such that there's no more
> memory to pin).
>
> > * how do MPI_send and MPI_recv use this buffer?
>
> It's different for each device.
>
> TCP: MPI_Send writes the user buffer down the
> socket. The kernel may or
> may not buffer it before sending it out on the
> network. MPI_Recv reads
> from a socket, which may or may not have been
> buffered by the kernel.
>
> Shmem: Described above -- MPI_Send writes to the
> shmem, MPI_Recv reads
> from the shmem.
>
> Myrinet: If the message is tiny, it is copied to a
> pre-pinned buffer and
> sent from there. If the message is short, it is
> copied to a longer
> pre-pinned buffer and send from there (tiny messages
> are sent in one
> Myrinet message, while short messages are sent as
> two Myrinet messages).
> If the message is long, the user's buffer is pinned
> and the message is
> sent from it's original location (if the OS supports
> it -- Solaris does
> not. In Solaris, we have to pin a new buffer and
> copy the message
> before sending). For MPI_Recv, it's pretty much the
> same.
>
> > * is there any difference between LAM
> implementation and MPICH
> > implementation of buffer management?
>
> Most likely. You'll have to ask the MPICH folks.
>
> There's also a whole buffer management scheme for
> message envelopes -- the
> meta data that is sent with each MPI message.
> Generally speaking, there
> are pre-allocated buffers for these (and LAM
> allocates more if it needs
> them).
>
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
>

__________________________________________________
Do You Yahoo!?
Check out Yahoo! Shopping and Yahoo! Auctions for all of
your unique holiday gifts! Buy at http://shopping.yahoo.com
or bid at http://auctions.yahoo.com
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/