LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-10-04 14:10:53


You code has inherent race conditions: the receives can be matched by
any send from any of the other processes. Hence, your MPI_RECV that
expects to receive a lot of data may actually only receive a single
int.

It looks like you are not using tags correctly -- you should probably
using tags to separate the different kinds of messages and/or messages
from different nodes (e.g., you don't want the start_pos from one node
to be mixed with the data from another).

On Oct 4, 2004, at 2:57 PM, Sriram Rallabhandi wrote:

>
>
> Hi all,
>
> I'm trying to write a trivial program of sending array elements from
> slave nodes to root node.
> The slave nodes create a portion of an array and try to send it to
> the root node.
> I wrote the following code where the slave nodes send start_pos,
> share and nsareanode array to the root node.
>
> For the example output shown below, I'm running on 3 nodes including
> root. nsareanode is the required
> partial array calculated in each node and transferred to the root
> node to put in another matrix nsarea[][]. For this example,
> the total array length is 30 and I'm splitting the computation into
> 3..with ten elements (share) computed by each node.
>
> if (rank!=0) {
>         // Send start_pos, share and nsareanode arrays to root node
>         MPI_Send(&start_pos,1,MPI_INT,0,1,MPI_COMM_WORLD);
>         MPI_Send(&share,1,MPI_INT,0,1,MPI_COMM_WORLD);
>         MPI_Send(nsareanode,share,MPI_FLOAT,0,1,MPI_COMM_WORLD);
>         fprintf(stderr,"Node %d sending stuff to Root\n",rank);
> }
> if (rank==0) {
>         for (tmprank=1;tmprank<psize;tmprank++) {
>                 fprintf(stderr,"Root about to receive data from
> Node\n");
>                 MPI_Recv(&start_pos,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TA
> G,MPI_COMM_WORLD,&status);
>                 fprintf(stderr,"start_pos=%d\n",start_pos);
>                 MPI_Recv(&share,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MP
> I_COMM_WORLD,&status);
>                 fprintf(stderr,"source=%d\tstart_pos=%d\tshare=%d\n",st
> atus.MPI_SOURCE,start_pos,share);
>                 MPI_Recv(nsareanode,share,MPI_FLOAT,MPI_ANY_SOURCE,1,MP
> I_COMM_WORLD,&status);
>                                 
>                 for (tmpint=start_pos;tmpint<start_pos+share;tmpint++)
> {
>                         nsarea[jjj][tmpint] =
> nsareanode[tmpint-start_pos];
>                         fprintf(stderr,"nsarea[%d][%d]=%f\n",jjj,tmpint
> ,nsarea[jjj][tmpint]);
>                 }               
>         }       
> }
>
>
>
> rank=2  share=10        start_pos=20
> rank=1  share=10        start_pos=10
> rank=0  share=10        start_pos=0
>
> Root about to receive data from Node
> start_pos=20
> source=1        start_pos=20    share=10
> nsarea[0][20]=0.000000
> nsarea[0][21]=53.554237
> nsarea[0][22]=55.088585
> nsarea[0][23]=54.225426
> nsarea[0][24]=58.777073
> nsarea[0][25]=64.479797
> nsarea[0][26]=70.079277
> nsarea[0][27]=75.008041
> nsarea[0][28]=77.893166
> nsarea[0][29]=78.853493
> Root about to receive data from Node
> start_pos=10
> Node 1 sending stuff to Root
> MPI_Recv: message truncated (rank 0, MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (0, MPI_COMM_WORLD):  - main()
> Node 2 sending stuff to Root
>
> -----------------------------------------------------------------------
> ------
>
> One of the processes started by mpirun has exited with a nonzero exit
> code.  This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 13897 failed on node n1 with exit status 1.
>
>
> It looks like, MPI_ANY_SOURCE is causing problems in the MPI_Recv
> statement. Could someone tell me what exactly is going
> on and how to implement this?
>
>
> Thanks
> Sriram
>
>
>
>
>
>
>
> -----------------------------------------------------------------------
> --------
> Sriram K. Rallabhandi
> Graduate Research Assistant       Work: 404 385 2789
> Aerospace Engineering                 Res:  404 603 9160
> Georgia Inst. of Technology
>
> -----------------------------------------------------------------------
> --------
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/