Hi,
Brian's correct about the deadlock. It's amazing how many MPI programs you
see which don't protect against the possibility of deadlocks by using the
non-blocking transfer routines instead of the blocking ones. I commented on
one on this mailing list only 2 weeks ago.
MPI does not enforce a "deadlock-safe" programming style. You can have
unsafe programs that work the majority of times but fail now and then, e.g.
if the message size is increased. It is a pity that you cannot provide an
option or environment variable to cause the library to check for unsafe
programming. It could slow the program, but you would only have to use it in
the "debug" stage of the project.
Regards
Neil
Brian Barrett wrote:
> On Jul 13, 2004, at 1:55 AM, x z wrote:
>
>> Does LAM-MPI optimize a Send whose destination is the local node (and
>> the corresponding Recv whose source is the local node)? How?
>> That is, if I have:
>> if (myID==0)
>> for (i=1;i<100;i++)
>> MPI_Send(&x, ..., tag, dest, ...)
>> else
>> for (i=1;i<100;i++)
>> MPI_Recv(&y, ..., ANY_TAG,source, ...)
>> and in some cases, the send is to the local node, the recv is from
>> local node.
>> This may happen if the data distribution is irregular and the source
>> and destination is not known until run time. If it is local, there is
>> no need to copy x to the buffer and then copy the buffer to y. But
>> can the LAM-MPI compiler/runtime detect that, and make approopriate
>> action, like converting the Send/Recv to simply y=x?
>
>
> If you call MPI_Send to your rank in MPI_COMM_WORLD and later call
> MPI_Recv from your rank in MPI_COMM_WORLD, LAM will deadlock (this being
> an erroneous program). However, if you had non-blocking receives
> preposted or non-blocking sends or something like that, yes, LAM will
> directly copy from the sending buffer to the receiving buffer.
>
> If you are sending between two processes on the same node, LAM will use
> shared memory for communication (if support for shared memory is built
> into your copy of LAM). In that case, the send buffer is moved into a
> temporary shared space and then into the receiving buffer.
>
> Hope this helps,
>
> Brian
>
--
+-----------------+---------------------------------+------------------+
| Neil Storer | Head: Systems S/W Section | Operations Dept. |
+-----------------+---------------------------------+------------------+
| ECMWF, | email: neil.storer_at_[hidden] | //=\\ //=\\ |
| Shinfield Park, | Tel: (+44 118) 9499353 | // \\// \\ |
| Reading, | (+44 118) 9499000 x 2353 | ECMWF |
| Berkshire, | Fax: (+44 118) 9869450 | ECMWF |
| RG2 9AX, | | \\ //\\ // |
| UK | URL: http://www.ecmwf.int/ | \\=// \\=// |
+--+--------------+---------------------------------+----------------+-+
| ECMWF is the European Centre for Medium-Range Weather Forecasts |
+-----------------------------------------------------------------+
|