LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: David Cronk (cronk_at_[hidden])
Date: 2004-07-13 10:19:59


Neil Storer wrote:
> Hi,
>
> Brian's correct about the deadlock. It's amazing how many MPI programs
> you see which don't protect against the possibility of deadlocks by
> using the non-blocking transfer routines instead of the blocking ones. I
> commented on one on this mailing list only 2 weeks ago.
>
> MPI does not enforce a "deadlock-safe" programming style. You can have
> unsafe programs that work the majority of times but fail now and then,
> e.g. if the message size is increased. It is a pity that you cannot
> provide an option or environment variable to cause the library to check
> for unsafe programming. It could slow the program, but you would only
> have to use it in the "debug" stage of the project.

MPI-CHECK from Iowa State is actually a pretty good tool for detecting
potential deadlock. See http://andrew.ait.iastate.edu/HPC/MPI-CHECK.htm
for details.

Marmot, from Stuttgart can also do some deadlock detection, but unlike
MPI-CHECK it only finds actual deadlock, not potential deadlock. See
http://www.hlrs.de/people/mueller/projects/marmot/index.html for more
details.

Dave.

>
> Regards
> Neil
>
> Brian Barrett wrote:
>
>> On Jul 13, 2004, at 1:55 AM, x z wrote:
>>
>>> Does LAM-MPI optimize a Send whose destination is the local node (and
>>> the corresponding Recv whose source is the local node)? How?
>>> That is, if I have:
>>> if (myID==0)
>>> for (i=1;i<100;i++)
>>> MPI_Send(&x, ..., tag, dest, ...)
>>> else
>>> for (i=1;i<100;i++)
>>> MPI_Recv(&y, ..., ANY_TAG,source, ...)
>>> and in some cases, the send is to the local node, the recv is from
>>> local node.
>>> This may happen if the data distribution is irregular and the source
>>> and destination is not known until run time. If it is local, there
>>> is no need to copy x to the buffer and then copy the buffer to y.
>>> But can the LAM-MPI compiler/runtime detect that, and make
>>> approopriate action, like converting the Send/Recv to simply y=x?
>>
>>
>>
>> If you call MPI_Send to your rank in MPI_COMM_WORLD and later call
>> MPI_Recv from your rank in MPI_COMM_WORLD, LAM will deadlock (this
>> being an erroneous program). However, if you had non-blocking
>> receives preposted or non-blocking sends or something like that, yes,
>> LAM will directly copy from the sending buffer to the receiving buffer.
>>
>> If you are sending between two processes on the same node, LAM will
>> use shared memory for communication (if support for shared memory is
>> built into your copy of LAM). In that case, the send buffer is moved
>> into a temporary shared space and then into the receiving buffer.
>>
>> Hope this helps,
>>
>> Brian
>>
>

-- 
Dr. David Cronk, Ph.D.                             phone: (865) 974-3735
Research Leader                                    fax: (865) 974-8296
Innovative Computing Lab 
http://www.cs.utk.edu/~cronk
University of Tennessee, Knoxville