Sriram Rallabhandi wrote:
>
>
> Hi all,
>
> I'm trying to write a trivial program of sending array elements from
> slave nodes to root node.
> The slave nodes create a portion of an array and try to send it to the
> root node.
> I wrote the following code where the slave nodes send start_pos, share
> and nsareanode array to the root node.
>
> For the example output shown below, I'm running on 3 nodes including
> root. nsareanode is the required
> partial array calculated in each node and transferred to the root node
> to put in another matrix nsarea[][]. For this example,
> the total array length is 30 and I'm splitting the computation into
> 3..with ten elements (share) computed by each node.
>
> *if (rank!=0) {
> // Send start_pos, share and nsareanode arrays to root node
> MPI_Send(&start_pos,1,MPI_INT,0,1,MPI_COMM_WORLD);
> MPI_Send(&share,1,MPI_INT,0,1,MPI_COMM_WORLD);
> MPI_Send(nsareanode,share,MPI_FLOAT,0,1,MPI_COMM_WORLD);
> fprintf(stderr,"Node %d sending stuff to Root\n",rank);
> }
> if (rank==0) {
> for (tmprank=1;tmprank<psize;tmprank++) {
> fprintf(stderr,"Root about to receive data from Node\n");
> MPI_Recv(&start_pos,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
> fprintf(stderr,"start_pos=%d\n",start_pos);
> MPI_Recv(&share,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
> fprintf(stderr,"source=%d\tstart_pos=%d\tshare=%d\n",status.MPI_SOURCE,start_pos,share);
> MPI_Recv(nsareanode,share,MPI_FLOAT,MPI_ANY_SOURCE,1,MPI_COMM_WORLD,&status);
>
> for (tmpint=start_pos;tmpint<start_pos+share;tmpint++) {
> nsarea[jjj][tmpint] = nsareanode[tmpint-start_pos];
> fprintf(stderr,"nsarea[%d][%d]=%f\n",jjj,tmpint,nsarea[jjj][tmpint]);
> }
> }
> }
>
>
>
> rank=2 share=10 start_pos=20
> rank=1 share=10 start_pos=10
> rank=0 share=10 start_pos=0
>
> Root about to receive data from Node
> start_pos=20
> source=1 start_pos=20 share=10
> nsarea[0][20]=0.000000
> nsarea[0][21]=53.554237
> nsarea[0][22]=55.088585
> nsarea[0][23]=54.225426
> nsarea[0][24]=58.777073
> nsarea[0][25]=64.479797
> nsarea[0][26]=70.079277
> nsarea[0][27]=75.008041
> nsarea[0][28]=77.893166
> nsarea[0][29]=78.853493
> Root about to receive data from Node
> start_pos=10
> Node 1 sending stuff to Root
> MPI_Recv: message truncated (rank 0, MPI_COMM_WORLD)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - main()
> Node 2 sending stuff to Root
> -----------------------------------------------------------------------------
>
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 13897 failed on node n1 with exit status 1.
>
>
> *It looks like, MPI_ANY_SOURCE is causing problems in the MPI_Recv
> statement. Could someone tell me what exactly is going
> on and how to implement this?
You are not enforcing messages to be received in the right order. That
is, since you use anytag and anysource, the messages can arrive in any
order. You may receive start from process 2, then start from process 1,
then share from process 2, then the array from process 2. Thus you are
receiving 10 floats but asked for 1 integer.
One easy way to fix this is to get status.source after the 1st receive
and use it in place of MPI_ANY_SOURCE for the following 2 receives.
For performance, I would suggest putting all the data in a single
message and receive with a count of MAX, where MAX is some appropriate
value. There is a lot of overhead associated with message passing so
you want to avoid sending a lot of small messages. Few big messages is
better than many small messages.
Hope this helps.
Dave.
>
>
> Thanks
> Sriram
>
>
>
>
> -------------------------------------------------------------------------------
> Sriram K. Rallabhandi
> Graduate Research Assistant Work: 404 385 2789
> Aerospace Engineering Res: 404 603 9160
> Georgia Inst. of Technology
> -------------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Dr. David Cronk, Ph.D. phone: (865) 974-3735
Research Leader fax: (865) 974-8296
Innovative Computing Lab
http://www.cs.utk.edu/~cronk
University of Tennessee, Knoxville
|