LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2006-05-26 01:20:15


On May 24, 2006, at 2:09 PM, Maureen O Flynn wrote:

> I am running experiments to measure execution times of
> several MPI_Send()one from several nodes at once to one
> receiving node.
>
> The performance timing gives strange jumps sporadically
> from 1000 microsecs up to 200,000 microsecs in the range
> of 8k to 65536 message sizes, then is fine, a straight
> line.
>
> I am wondering if this is due to 'unsafe' programming,
> that is I do not take system buffering into account when
> sending a number of messages together to one node?
>
> Most of the time the timing is predictable but for 10-20%
> it jumps very high, I am increasing my message sizes at
> one byte at a time, all messages sent are the same size.

After a first look at the code, I don't see anything that would be
unsafe. My guess is that you are filling up kernel buffers at
certain buffer sizes and starting to block in MPI. With longer
messages, we have a rendezvous protocol we use to act as flow
control, which is probably why it goes away for large messages.

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/