On Feb 18, 2005, at 2:17 PM, Shi Jin wrote:
> Hi there,
>
> When I was debugging somebody's MPI code, I found the
> running code using LAM has a deadlock with MPICH. So I
> wrote a tiny toy program to test the idea.
> The code is attached at the end.
> The idea is that to see whether MPI_Send is blocking
> or not. The code actually only uses 1 process to
> demonstrate the idea.
>
> In LAM under linux, it runs fine for any SIZE.
> But for MPICH under linux and MPIPro under windows,
> for SIZE larger than some value(say 10), the code
> runs into a deadlock.
>
> I checked the mannual of all the manpage of MPI_Send,
> they all say that MPI_Send "May" block, which I
> interpret as it blocks under some condition and not
> otherwise.
>
> My questions are:
> 1. Any detailed explanation for all that's happened?
You are seeing exactly what you think you are seeing. MPI_Send may or
may not be blocking. This is one of the points in the standard where
the MPI implementation is free to do whatever it wants in terms of
making MPI_Send blocking. The only rule for MPI_Send is that the
buffer passed to Send is useable as soon as MPI_Send returns.
To be truly standards conferment, you should treat MPI_Send like
MPI_Ssend. Of course, if your algorithm doesn't require MPI_Ssend
semantics, you should use MPI_Send in your actual code (as MPI_Ssend
has significant overhead compared to MPI_Send in many cases.)
> 2. Is LAM smarter than the other two to know that the
> code is dealing with a single node so that it won't
> block?
Nope, probably not any smarter. Maybe luckier ;). If you send
increasingly large messages without a posted receive, eventually the
MPI_Send will block.
> 3. In principle, shall I always regard MPI_Send as a
> blocking function and should avoid something like:
> loop rank for all
> {
> MPI_Send( to rank)
> MPI_Recv( from rank)
> }
> Because that's the situation we encountered in the
> real situation inside a loop when the rank index has
> a chance to be the same as its own rank.
The above code snippet violates the MPI standard (I believe the MPI
standard document explicitly mentions this case). A more subtle
version of this is the case where every process does an MPI_Send to
((my_rank + 1) % comm_world_size) then does an MPI_Recv from ((my_rank
- 1) % comm_world_size). This too results in deadlock for
significantly large messages, as the processes have nowhere to send to.
> 4. Finally, the fact that LAM is differnt from the
> others makes me nervous. Shall I awayls try to compile
> the code against different implementations to make
> sure it runs fine with all of them in order to make
> sure it is really portable?
If you (and all your co-developers) follow the MPI standard to the
letter, your code should basically work everywhere with every
implementation. Of course, some deviations from the standard exist in
every implementation (but then you get to yell at the implementor... I
mean be nice to the poor overworked MPI developer...). But as you've
discovered, it's easy to violate the MPI standard without realizing it.
Sometimes, the best you can do is test on a variety of platforms.
(The same is true for MPI implementors - Unix is unfortunately not
unix, as a quick read through the LAM changelog will verify).
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have an LAM/MPI day: http://www.lam-mpi.org/
|