Hello,
Thanks for the reply!
I checked my code. The subroutine 'void exchange_interface_data(int
rank, int local_Nz, int comm_tag)' is being called in the main program at two
places. There is no 'if' condition for calling this subroutine. So it is being
called by all the processes. Can calling this subroutine twice be a reason for
this error? I am showing you a part of my program where I am calling this
subroutine:
-------------------------------------------------------------------
comm_tag = n;
exchange_interface_data(rank, local_Nz, comm_tag);
if(rank==1)
for(k=0; k<Nz; k++)
cout<<"rank = "<<rank<<" T = ["<<k<<"]["<<temp1<<"]["<<temp2<<"] = "<<T[k]
[temp1][temp2]<<endl;
Red_SOR(A, F, T, rank, local_Nz);
comm_tag = n+1;
exchange_interface_data(rank, local_Nz, comm_tag);
Black_SOR(A, F, T, rank, local_Nz);
-------------------------------------------------------------
Can the structure of my program cause this type of error? Pls clarify.
Regards,
Ravi Kumar
Quoting Aditya Datey <avdatey_at_[hidden]>:
> Please make sure the subroutine : 'void exchange_interface_data(int
> rank, int local_Nz, int comm_tag)'
> is getting called by All the nodes and not just the sending node (or
> receiving node).
>
> i.e Is this subroutine call made under another
> if(noderank== something) statement?
> If it is then it will most certainly crash, since the other node will
> never get to calling the subroutine.
>
> Im a beginner with MPI & I frequently got this error message due to
> incorrect code path design in my initial programs.
>
> Hope this helps,
> Aditya Datey.
>
>
>
>
> On Tue, 2005-02-22 at 14:34, bcruchet_at_[hidden] wrote:
> > In some cases this happen when you try write (MPI_Send) when there aren't
> > anybody to receive (MPI_Recv).
> >
> > in other words you probably is send data to a node that not is waiting for
> > recieve
> >
> > -------------------------------------------------
> > Boris Cruchet C.
> > http://boris.guliv.cl
> > http://guliv.cl
> > -------------------------------------------------
> >
> > > Hello,
> > >
> > > Could anyone explain why I am getting errors like:
> > >
> > > MPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
> > > Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> > > Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> > > Rank (1, MPI_COMM_WORLD): - main()
> > > ---------------------------------------------------------------------
> > >
> > > One of the processes started by mpirun has exited with a nonzero exit
> > > code. This typically indicates that the process finished in error.
> > > If your process did not finish in error, be sure to include a "return
> > > 0" or "exit(0)" in your C code before exiting the application.
> > >
> > > PID 22326 failed on node n0 with exit status 1.
> > >
> -------------------------------------------------------------------------
> > >
> > > I am attaching part of my subroutine which calls MPI_Send/MPI_Recv.
> > >
> > > here is the part of my code:
> > >
> > > void exchange_interface_data(int rank, int local_Nz, int comm_tag)
> > > {
> > >
> > > MPI_Status status;
> > >
> > > if(rank==0)
> > > MPI_Send(&T[local_Nz-1][0]
> > > [0],Nx*Ny,MPI_DOUBLE,rank+1,comm_tag,MPI_COMM_WORLD);
> > >
> > > if(rank==1)
> > >
> MPI_Recv(&T[rows_per_process*rank-1][0][0],Nx*Ny,MPI_DOUBLE,rank-
> > > 1,comm_tag,MPI_COMM_WORLD,&status);
> > >
> > > }
> > >
> > > Pls help me out.
> > >
> > > Thanks!
> > > Ravi
> > >
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> >
> >
> >
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|