Note that testing a simple small program is not the same thing as
testing your complex MPI application. The memory interactions, for
instance, will likely be totally different. So even though your
*concept* appears to be right (which is exactly what small test
programs are excellent at validating), this test doesn't say anything
about the correctness of your larger MPI application.
As for valgrind instructions, I refer you to the LAM FAQ for
information on debugging in parallel and to the official Valgrind
documentation because it will explain much better -- and more
accurately -- than I can.
On Feb 22, 2005, at 3:25 PM, Kumar, Ravi Ranjan wrote:
> Hi,
>
> I wrote a simple program to check the same thing and it works
> perfectly.
>
> 1 #include<iostream.h>
> 2 #include<math.h>
> 3 #include<fstream.h>
> 4 #define SIZE 10
> 5 #include<time.h>
> 6 #include<iomanip.h>
> 7 #include<mpi.h>
> 8 #include<stdio.h>
> 9 #include<stdlib.h>
> 10
> 11 int main( int argc, char **argv )
> 12 {
> 13
> 14 int myrank, count = 0;
> 15 int A[SIZE][SIZE][SIZE],
> B[100],C[SIZE][SIZE][SIZE],i,j,k,m;
> 16 MPI_Status status;
> 17 MPI_Init( &argc, &argv );
> 18 MPI_Comm_rank( MPI_COMM_WORLD, &myrank );
> 19
> 20 if(myrank == 1)
> 21 for(k=0;k<10;k++)
> 22 for(i=0;i<10;i++)
> 23 for(j=0;j<10;j++)
> 24 {
> 25 A[k][i][j] = ++count;
> 26 cout<<A[k][i][j]<<" ";
> 27 }
> 28
> 29 if (myrank == 1) /* code for process 1 */
> 30 MPI_Send(&A[0][0][0], 100, MPI_INT, 0, 99,
> MPI_COMM_WORLD);
> 31
> 32 if (myrank == 0) /* code for process 0 */
> 33 MPI_Recv(&B[0], 100, MPI_INT, 1, 99, MPI_COMM_WORLD,
> &status);
> 34
> 35 if (myrank == 1) /* code for process 1 */
> 36 MPI_Send(&A[9][0][0], 100, MPI_INT, 2, 99,
> MPI_COMM_WORLD);
> 37
> 38 if (myrank == 2) /* code for process 2 */
> 39 MPI_Recv(&C[0][0][0], 100, MPI_INT, 1, 99,
> MPI_COMM_WORLD,
> &status);
> 40
> 41 if(myrank == 2)
> 42 {
> 43 cout<<"Rank = "<<myrank<<endl;
> 44 for(i=0,m=0;i<10;i++)
> 45 for(j=0;j<10;j++,m++)
> 46 cout<<"B["<<m<<"] = "<<B[m]<<" C[0]["<<i<<"]["<<j<<"]
> = "<<C[0]
> [i][j]<<endl;
> 47 }
> 48
> 49
> 50 if(myrank == 0)
> 51 {
> 52 cout<<"Rank = "<<myrank<<endl;
> 53 for(i=0;i<100;i++)
> 54 cout<<"B["<<i<<"] = "<<B[i]<<endl;
> 55 }
> 56
> 57 MPI_Finalize();
> 58 return 0;
> 59 }
>
>
>
> but I dont understand what is wrong with the previous code. I checked
> it many
> times but not able to fix the prob. Could you please help me? Also, it
> would be
> of great help to me, if you could write commands for using valgrind.
>
> Regards,
>
> Ravi
>
>
> Quoting Jeff Squyres <jsquyres_at_[hidden]>:
>
>> It sounds like one of your MPI processes is crashing, and the other is
>> detecting that (i.e., you're trying to MPI_Recv on MCW rank 1 and the
>> peer that it's trying to receive from is dead).
>>
>> You should run your code through a memory-checking debugger such as
>> valgrind to see if it can point out any obvious (or non-obvious)
>> problems. Check the debugging section of the LAM FAQ for restrictions
>> on using memory-checking debuggers.
>>
>>
>> On Feb 22, 2005, at 1:15 PM, Kumar, Ravi Ranjan wrote:
>>
>>> Hello,
>>>
>>> Could anyone explain why I am getting errors like:
>>>
>>> MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
>>> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
>>> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
>>> Rank (1, MPI_COMM_WORLD): - main()
>>> ---------------------------------------------------------------------
>>> --
>>> ------
>>>
>>> One of the processes started by mpirun has exited with a nonzero exit
>>> code. This typically indicates that the process finished in error.
>>> If your process did not finish in error, be sure to include a "return
>>> 0" or "exit(0)" in your C code before exiting the application.
>>>
>>> PID 22326 failed on node n0 with exit status 1.
>>> ---------------------------------------------------------------------
>>> --
>>> ------
>>>
>>> I am attaching part of my subroutine which calls MPI_Send/MPI_Recv.
>>>
>>> here is the part of my code:
>>>
>>> void exchange_interface_data(int rank, int local_Nz, int comm_tag)
>>> {
>>>
>>> MPI_Status status;
>>>
>>> if(rank==0)
>>> MPI_Send(&T[local_Nz-1][0]
>>> [0],Nx*Ny,MPI_DOUBLE,rank+1,comm_tag,MPI_COMM_WORLD);
>>>
>>> if(rank==1)
>>>
>>> MPI_Recv(&T[rows_per_process*rank-1][0][0],Nx*Ny,MPI_DOUBLE,rank-
>>> 1,comm_tag,MPI_COMM_WORLD,&status);
>>>
>>> }
>>>
>>> Pls help me out.
>>>
>>> Thanks!
>>> Ravi
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>
>> --
>> {+} Jeff Squyres
>> {+} jsquyres_at_[hidden]
>> {+} http://www.lam-mpi.org/
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|