LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Kumar, Ravi Ranjan (rrkuma0_at_[hidden])
Date: 2005-03-01 16:23:21


Hello,

I am using MPI in my code for solving a Red Black SOR problem. I need to
exchange data (using MPI_Send/MPI_Recv) many times in a loop to get the final
resulst. To achieve this, I have to run two loops as shown in my code below.
The inner loops calls exchange_inetrface_subroutine repeatedly. I get correct
results after 1st compeletion of the main loop. However, code get stuck while
executing the main loop for 2nd time. I am using MPI_Barrier for all the
processes to complete the task before starting the main loop again. Is it
correct? The error I get is after the code. Pls suggest me the reasons for the
faliure of the code execution. Thanks for your help!

 for(n=1; n<=Nt; n++) //MAIN LOOP STARTS
 {

 exchange_interface_data_old_T(rank, local_Nz, comm_tag);

 calculate_F(rank, local_Nz, n);

 i2 = 0;
                       // INNER LOOP STARTS
 do {
 
 Red_SOR(A, F, T, rank, local_Nz, i2, error);
 
 exchange_interface_data_T(rank, local_Nz, comm_tag);

 Black_SOR(A, F, T, rank, local_Nz, i2, error);
 
 exchange_interface_data_T(rank, local_Nz, comm_tag);
 
 max_error = max_error_norm(error, i2, rank);
 
 } while(max_error > tolerance);

                       // INNER LOOP ENDS

 for(k=rank*rows_per_process; k<local_Nz; k++)
 for(i=0; i<Nx; i++)
 for(j=0; j<Ny; j++)
 u[k][i][j] = (1 + 2*Tq/t) * T[k][i][j] + (1 - 2*Tq/t) * old_T[k][i][j] - u[k]
[i][j];

for(k=rank*rows_per_process; k<local_Nz; k++)
for(i=0; i<Nx; i++)
for(j=0; j<Ny; j++)
old_T[k][i][j] = T[k][i][j];

MPI_Barrier(MPI_COMM_WORLD);

cout.precision(10);
for(k=rank*rows_per_process; k<local_Nz; k++)
cout<<"rank = "<<rank<<" Temp = "<<T[k][temp1][temp2]<<endl;

MPI_Barrier(MPI_COMM_WORLD);

}
                    // MAIN LOOP ENDS

MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD): - MPI_Recv()
Rank (2, MPI_COMM_WORLD): - MPI_Barrier()
Rank (2, MPI_COMM_WORLD): - main()
-----------------------------------------------------------------------------

One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 17142 failed on node n0 with exit status 1.
-----------------------------------------------------------------------------

regards,
Ravi R. Kumar