On Aug 15, 2005, at 7:34 AM, Michael Lees wrote:
> Does the MPI_SENDRECV pattern rely on each node receiving a response to
> a send, ie., for every send there is one recv.
If you replace "node" with "process", then yes. See
http://www.mpi-forum.org/docs/mpi-11-html/node52.html#Node52 for
details. It simply means that you are doing both a send and a receive,
but are combining them in a single function call.
>> - Do the slaves know if the message they send is going to generate a
>> response from the master?
> No
In this case, it may not be appropriate for the slaves to use SENDRECV.
>> Check for other OS kinds of slowness -- are you overloading nodes
>> (more
>> processes than processors)? Are you doing significant I/O? Is there
>> a
>> lot of message passing traffic? And so on.
>
> The problem only occurs when I overload ie., with more than one slave
> per node (on a one processor node).
> Unfortunately I want to be able to run experiments with more slaves
> than
> I have CPUs :)
> Is overloading known to cause serious slow down or is there something I
> can do about it?
Yes, it is, especially if your slaves are all actively doing work --
and therefore claiming CPU cycles. See another column edition for a
description of oversubscription:
http://cw.squyres.com/columns/2004-01-CW-MPI-Mechanic.pdf
Also, when multiple processes are on a single node, LAM will default to
using the usysv RPI which uses active spin locks for communication
(meaning that processes spin while waiting for a lock). This is
clearly bad an in oversubscription scenario. When oversubscribing, you
probably want to use the sysv RPI, because it uses SYSV semaphores for
synchronization (forcing processes to yield the CPU when waiting for a
lock). SYSV is a bit slower (i.e., active spin locks are faster when
you don't oversubscribe), but it may help your overall performance when
oversubscribing.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|