I can't see why you would need MPI_Barrier here.
As we have suggested before, have you looked at the LAM FAQ for
debugging tips? We strongly suggest running your code through a
memory-checking debugger, for example.
On Mar 1, 2005, at 4:23 PM, Kumar, Ravi Ranjan wrote:
> Hello,
>
> I am using MPI in my code for solving a Red Black SOR problem. I need
> to
> exchange data (using MPI_Send/MPI_Recv) many times in a loop to get
> the final
> resulst. To achieve this, I have to run two loops as shown in my code
> below.
> The inner loops calls exchange_inetrface_subroutine repeatedly. I get
> correct
> results after 1st compeletion of the main loop. However, code get
> stuck while
> executing the main loop for 2nd time. I am using MPI_Barrier for all
> the
> processes to complete the task before starting the main loop again. Is
> it
> correct? The error I get is after the code. Pls suggest me the reasons
> for the
> faliure of the code execution. Thanks for your help!
>
> for(n=1; n<=Nt; n++) //MAIN LOOP STARTS
> {
>
> exchange_interface_data_old_T(rank, local_Nz, comm_tag);
>
> calculate_F(rank, local_Nz, n);
>
> i2 = 0;
> // INNER LOOP STARTS
> do {
>
> Red_SOR(A, F, T, rank, local_Nz, i2, error);
>
> exchange_interface_data_T(rank, local_Nz, comm_tag);
>
> Black_SOR(A, F, T, rank, local_Nz, i2, error);
>
> exchange_interface_data_T(rank, local_Nz, comm_tag);
>
> max_error = max_error_norm(error, i2, rank);
>
> } while(max_error > tolerance);
>
> // INNER LOOP ENDS
>
> for(k=rank*rows_per_process; k<local_Nz; k++)
> for(i=0; i<Nx; i++)
> for(j=0; j<Ny; j++)
> u[k][i][j] = (1 + 2*Tq/t) * T[k][i][j] + (1 - 2*Tq/t) *
> old_T[k][i][j] - u[k]
> [i][j];
>
> for(k=rank*rows_per_process; k<local_Nz; k++)
> for(i=0; i<Nx; i++)
> for(j=0; j<Ny; j++)
> old_T[k][i][j] = T[k][i][j];
>
> MPI_Barrier(MPI_COMM_WORLD);
>
> cout.precision(10);
> for(k=rank*rows_per_process; k<local_Nz; k++)
> cout<<"rank = "<<rank<<" Temp = "<<T[k][temp1][temp2]<<endl;
>
> MPI_Barrier(MPI_COMM_WORLD);
>
> }
> // MAIN LOOP ENDS
>
> MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> Rank (2, MPI_COMM_WORLD): - MPI_Recv()
> Rank (2, MPI_COMM_WORLD): - MPI_Barrier()
> Rank (2, MPI_COMM_WORLD): - main()
> -----------------------------------------------------------------------
> ------
>
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 17142 failed on node n0 with exit status 1.
> -----------------------------------------------------------------------
> ------
>
> regards,
> Ravi R. Kumar
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|