LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Aamir Shafi (aamir.shafi_at_[hidden])
Date: 2005-02-18 18:26:07


Shi Jin wrote:

>Hi there,
>
>When I was debugging somebody's MPI code, I found the
>running code using LAM has a deadlock with MPICH. So I
>wrote a tiny toy program to test the idea.
>The code is attached at the end.
>The idea is that to see whether MPI_Send is blocking
>or not. The code actually only uses 1 process to
>demonstrate the idea.
>
>In LAM under linux, it runs fine for any SIZE.
>But for MPICH under linux and MPIPro under windows,
>for SIZE larger than some value(say 10), the code
>runs into a deadlock.
>
>I checked the mannual of all the manpage of MPI_Send,
>they all say that MPI_Send "May" block, which I
>interpret as it blocks under some condition and not
>otherwise.
>
>My questions are:
>1. Any detailed explanation for all that's happened?
>
>
Hi, Shi

For a better answer, checkout,

http://www.lam-mpi.org/MailArchives/lam/msg09865.php

MPI_Send uses standard mode of send. For standard mode of send, MPICH
uses eager-send protocol for messages of size less than 128K. This
basically means that your code will not block for message size less than
128K, and it will block after this particular number because the
communication protocol changes to rendezvous protocol, according to
which, control messages are exchanged between the sender and the
receiver before actual transmission takes place. This is the behavior
for TCP devices.

Because your LAM/MPI code is not blocking, this suggests there is no
exchange of control messages before the actual transmission, or it is
quite possible that you may not have reached that certain limit. And
again, this behavior depends on what communication device LAM/MPI is using.

>2. Is LAM smarter than the other two to know that the
>code is dealing with a single node so that it won't
>block?
>
>
Again, either there is not exchange of control messages, or quite
possibly you dint reach that particular limit.

>3. In principle, shall I always regard MPI_Send as a
>blocking function and should avoid something like:
>loop rank for all
>{
>MPI_Send( to rank)
>MPI_Recv( from rank)
>}
>
>
You should alwayz treat MPI_Send as a blocking call and avoid such calls.

>4. Finally, the fact that LAM is differnt from the
>others makes me nervous. Shall I awayls try to compile
>the code against different implementations to make
>sure it runs fine with all of them in order to make
>sure it is really portable?
>
>
Check out, http://www.lam-mpi.org/faq/category6.php3#question1

Hope it helps,
--Aamir

>Thanks a lot
>
>
>
>/////////////// localLock.c //////////////
>#include<stdio.h>
>#include<mpi.h>
>#define SIZE 9999999
>
>int main(int argc, char * argv[])
>{
> int err,rank,numProcs;
> double* data;
> double megaSize;
> MPI_Status status;
>
> MPI_Init(&argc,&argv);
> MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
> data=(double*)malloc(SIZE*sizeof(double));
> megaSize=SIZE*sizeof(double)/1024.0/1024.0;
> printf("Data size=%f MBytes\n",(int)megaSize);
>
>
>MPI_Send(data,SIZE,MPI_DOUBLE,0,1,MPI_COMM_WORLD);
>
>MPI_Recv(data,SIZE,MPI_DOUBLE,0,1,MPI_COMM_WORLD,&status);
>
> printf("Done!\n");
> free(data);
> MPI_Finalize();
>}
>/////////////////// End of code
>
>
>
>
>
>__________________________________
>Do you Yahoo!?
>The all-new My Yahoo! - What will yours do?
>http://my.yahoo.com
>_______________________________________________
>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>