LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-08-16 10:53:19


On Aug 16, 2005, at 10:51 AM, Michael Lees wrote:

> I was thinking some more about the issue of sudden slow down in my
> application. After reading about the Buffered implementation of
> MPI_Send
> and MPI_Recv I was wondering if my performance could be reducing when
> the buffer fills?

Could be, yes.

> Each message is 132 bytes so the thread design above will allow a lots
> of sends to occur before the internal buffer is full(the buffer is
> default - which is 64k?). This could result in 496 messages being sent
> before the buffer fills up.
>
> What happens when a small message stored in the buffer is sent - does
> it
> empty the used buffer immediately? Or is the buffer reclaimed when
> needed ie., when full, or is there some type of garbage collection?

All LAM does it write down the socket with write() (or writev()). So
the buffer is not actually in LAM's control -- it's the TCP socket
implementation in the kernel.

You can think of it as a dense, circular buffer such that when the
buffer is full, the write/writev will block. When data is written out
of the buffer onto the physical media, the "end" pointer is moved up to
point to where the next unwritten data is. This allows write/writev()
to copy new data into the now-available section of the circular buffer.

And so on.

> I can't figure out what is causing the sudden massive drop in
> peformance. I've used gkrellm to monitor memory and cpu usage and both
> seem fairly constant and no paging is done at all. The other odd thing
> is that it doesn't matter if I allocate 11 processes to one cpu or 11
> processes to 3 cpus - the performance drop happens at about the same
> point.
>
> Is there a decent free tool for monitoring/profiling mpi programs?

Try replacing your MPI_Sends with MPI_Ssends (synchronous sends) --
these will not complete until the receiver has posted a matching
receive. You can see if you're sending way too many messages in this
case.

There are also a few performance monitoring tools out there such as MPE
(from Argonne). There's also a bunch of non-free vendor solutions out
there.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/