On Mar 20, 2005, at 1:22 AM, Kumar, Ravi Ranjan wrote:
> Below is the subroutine I am using for data exchange between different
> processes. In my code, I need to solve for 101x101x101 points in a 3D
> domain.
> For this I defined a 3D array T[101][101][10] dynamically and to
> parallelize
> the problem I divided T[Nz][Nx][Ny], along Nz, into several slices.
> Each
> processor works on a slice and needs interface data from neighbouring
> nodes.
> For exchanging interface data, I am using non-blocking
> MPI_Send/MPI_Recv, see
> the subroutine below:
Your error message earlier indicated that the error message was coming
from a call to MPI_Recv. Your function only calls MPI_Irecv, so that
would seem to indicate that your error message is not coming from this
function. So you are going to need to look at the rest of your
application for where the source of the error is.
Hope this helps,
Brian
> void exchange_interface_data_T(.....)
> {
>
> MPI_Status status;
> MPI_Request request;
>
>
> if(rank%2==0 && rank != num_processes-1){
> MPI_Isend(&T[local_Nz][0][0], Nx*Ny, MPI_DOUBLE, rank+1,
> comm_tag,
> MPI_COMM_WORLD,&request);
> MPI_Request_free(&request);
> }
>
>
> else if(rank%2==1){
> MPI_Irecv(&T[0][0][0],Nx*Ny,MPI_DOUBLE,rank-
> 1,comm_tag,MPI_COMM_WORLD,&request);
> MPI_Wait(&request,&status);
> }
>
>
> if(rank%2==1){
> MPI_Isend(&T[1][0][0],Nx*Ny, MPI_DOUBLE, rank-1,comm_tag+51,
> MPI_COMM_WORLD,&request);
> MPI_Request_free(&request);
> }
>
>
> else if(rank%2==0 && rank != num_processes-1){
> MPI_Irecv(&T[local_Nz+1][0]
> [0],Nx*Ny,MPI_DOUBLE,rank+1,comm_tag+51,MPI_COMM_WORLD,&request);
> MPI_Wait(&request,&status);
> }
>
>
> if(rank%2==0 && rank != 0){
> MPI_Isend(&T[1][0][0],Nx*Ny,MPI_DOUBLE,rank-
> 1,comm_tag+101,MPI_COMM_WORLD,&request);
> MPI_Request_free(&request);
> }
>
>
>
> else if(rank%2==1 && rank != num_processes-1){
> MPI_Irecv(&T[local_Nz+1][0]
> [0],Nx*Ny,MPI_DOUBLE,rank+1,comm_tag+101,MPI_COMM_WORLD,&request);
> MPI_Wait(&request,&status);
> }
>
> if(rank%2==1 && rank != num_processes-1){
> MPI_Isend(&T[local_Nz][0]
> [0],Nx*Ny,MPI_DOUBLE,rank+1,comm_tag+201,MPI_COMM_WORLD,&request);
> MPI_Request_free(&request);
> }
>
>
> else if(rank%2==0 && rank != 0){
> MPI_Irecv(&T[0][0][0],Nx*Ny,MPI_DOUBLE,rank-
> 1,comm_tag+201,MPI_COMM_WORLD,&request);
> MPI_Wait(&request,&status);
> }
>
>
> }
>
> This is how I am approaching data exchange between neighbouring nodes
> (slices).
> Am I doing something wrong in data exchange? Pls suggest me.
>
> Thanks a lot!
> Ravi R. Kumar
>
>
>
>
> Quoting Brian Barrett <brbarret_at_[hidden]>:
>
>> On Mar 20, 2005, at 12:12 AM, Kumar, Ravi Ranjan wrote:
>>
>>> I wrote a code in C++ using MPI. It works fine and gives correct
>>> result for
>>> smaller 3D array size case for e.g. T[51][51][51]. However, my code
>>> hangs when
>>> I try to run the same for larger size case i.e T[101][101][101] with
>>> an error
>>> message as below:
>>>
>>> MPI_Recv: message truncated (rank 0, MPI_COMM_WORLD)
>>> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
>>> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
>>> Rank (0, MPI_COMM_WORLD): - main()
>>
>> <snip>
>>
>>> I read sometime ago that this may be due to mismatch in number of
>>> data
>>> sent and
>>> number of data received in MPI_Send/MPI_Recv process. I have checked
>>> this thing
>>> many times and found no mismatch in number of data exchanged, still I
>>> am
>>> getting this error. What can be the reason for this? Could anyone
>>> please
>>> explain?
>>
>> The reason is exactly as you surmised. For some reason, a message has
>> arrived that is bigger than the buffer you posted. It's hard to tell
>> why this is occurring, but I would look carefully at your send/recv
>> pairs again. These are hard ones to debug, as LAM is in an error
>> condition and doesn't give you much information about what happened.
>> I
>> notice you are using blocking receives - this helps a little bit, in
>> that you can print out what messages are being printed (and their
>> sizes) and you can print out the size of the buffer you are providing
>> to MPI_Recv. If you send a big message and post an ANY_SOURCE recv,
>> Murphy's law pretty much guarantees it will happen in the worst order
>> possible.
>>
>>
>> Hope that helps,
>>
>> Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|