On Wednesday, August 13, 2003, at 10:43 AM, Pablo Milano wrote:
> Brian, I don´t understand the "progress behaviour" you talk
> about. My program, basically works as follows:
>
> 1) Initialize data
> 2) Start calculus until reaching some condition
> 3) Start all the nonblocking send and receive operations (34 sends and
> 34 receives per process).
> 4) Continue the calculus (this step takes much more time than 2)
> 5) Wait for all the nonblocking operations to complete
> 6) If more calculus is needed, then go to 2
>
> The blocking approach is exactly the same (same code) but in 3 the send
> and receive operations are blocking and, of course, step 5 is omitted.
For long messages, here's what is happening. It step 3, LAM is
scrambling to get all the messages started - sending out the headers,
receiving other headers, things like that. Perhaps one or two early
messages get most of the way through, but even that's unlikely. So
there are lots of pending messages for LAM to deal with when you enter
step 4. While you are working on calculations, LAM can't make any
progress on the messages - the entire app is single threaded and you
aren't entering into LAM at any point. So then you enter step 5 and
try to complete all the messages. Since there are so many pending, all
trying to complete at the same time, things get backed up and take
longer than with MPI_SEND/MPI_RECV.
The behavior described above is for TCP. With LAM 7.0, the Myrinet/gm
interface is capable of some progress "in the background" (ie, while
you are doing computation). But, of course, Myrinet isn't cheap and
there are still instances where LAM will have to block. Commercial MPI
implementations also generally have some way of making progress on
non-blocking communication while the user is performing computation,
but again, those aren't cheap options.
To better hide the impact of non-blocking communication, throwing the
occasional MPI_Test in during computation can prevent the huge spike of
communication in step 5. LAM will try to make progress on messages by
refilling the TCP buffers (basically - not quite that simple) every
time you enter MPI_Test, so the communication cost should be better
hidden.
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|