LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Shashwat Srivastav (ssrivast_at_[hidden])
Date: 2003-10-11 13:36:01


Hello,

Sorry for the delay in getting back to you. The code you sent is not
complete enough to make an accurate guess of the problem. But looking
at the error it seems that rank 0 gets seg faulted. This might be
because of improper memory allocation in ReceiveTree function. You may
want to use tools like bcheck, valgrind or a debugger to track down
memory handling bug.

--
Shashwat Srivastav
LAM / MPI Developer
Indiana University
ssrivast_at_[hidden]
On Tuesday, Oct 7, 2003, at 20:31 America/Chicago, USFResearch_at_[hidden] 
wrote:
> Hello everyone.
>  
> I am currently having segmentation fault issues when programming with 
> MPI in C++ and I'm hoping someone might be able to help me out.  It 
> only happens when I am using greater than 2 processors.  It also only 
> happens when a.) the communication is relatively small and b.) must be 
> done many times.
>  
> Here are the two functions which use communication.  I have 
> removed/renamed some variables and some "new" statements so as to make 
> the meat of the problem more viewable.
>  
> void TransmitTree(_Split* Tree)
> {
>   MPI_Send(&Num, 1, MPI_INT, 0, 1, MPI_COMM_WORLD);
>  
>   MPI_Send(A, Num, MPI_FLOAT, 0, 2, MPI_COMM_WORLD);
>   MPI_Send(B, Num, MPI_SHORT, 0, 3, MPI_COMM_WORLD);
>   MPI_Send(C, Num, MPI_INT, 0, 4, MPI_COMM_WORLD);
> }
>  
> _Split* ReceiveTree()
> {
>   MPI_Status Status;
>  
>   MPI_Recv(&Num, 1, MPI_INT, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, 
> &Status);
>  
>   MPI_Recv(A, Num, MPI_FLOAT, Status.MPI_SOURCE, 2, MPI_COMM_WORLD, 
> MPI_STATUS_IGNORE);
>   MPI_Recv(B, Num, MPI_SHORT, Status.MPI_SOURCE, 3, MPI_COMM_WORLD, 
> MPI_STATUS_IGNORE);
>   MPI_Recv(C, Num, MPI_INT, Status.MPI_SOURCE, 4, MPI_COMM_WORLD, 
> MPI_STATUS_IGNORE);
> }
> And here is how the function is used:
>  
>       if (myrank==0)
>         for(x=myNumTrees;x<Options->NumberOfTrees;x++)
>           Trees[x]=ReceiveTree();
>       else
>         for(x=0;x<myNumTrees;x++)
>           TransmitTree(Trees[x]);
> Each processor has the number of "trees" stored in myNumTrees and 
> Options->NumberOfTrees is the total number of trees across all CPUs.  
> Each CPU other than rank0 must transfer all its trees to rank0.
>  
> Because of the preconditions for failure I thought maybe their were 
> some buffer overflow issues (with rank0) and as a result tried to 
> stick in MPI_Ssend instead of MPI_Send.  I have tried to have all of 
> them do either MPI_Irecv or MPI_Isend followed by an MPI_Wait.  I just 
> can't seem to get it right.
>  
> The errors look like this, but they vary on the exact place of the 
> error and so I can't track it down much further using "cout."  It 
> appears to me that rank0 died, but I cannot figure out why:
>  
> mpirun -np 6 ./DT
>  
> MPI_Send: process in local group is dead (rank 3, MPI_COMM_WORLD)
> Rank (3, MPI_COMM_WORLD): Call stack within LAM:
> Rank (3, MPI_COMM_WORLD):  - MPI_Send()
> Rank (3, MPI_COMM_WORLD):  - main()
> MPI_Send: process in local group is dead (rank 5, MPI_COMM_WORLD)
> Rank (5, MPI_COMM_WORLD): Call stack within LAM:
> Rank (5, MPI_COMM_WORLD):  - MPI_Send()
> Rank (5, MPI_COMM_WORLD):  - main()
> MPI_Send: process in local group is dead (rank 4, MPI_COMM_WORLD)
> Rank (4, MPI_COMM_WORLD): Call stack within LAM:
> Rank (4, MPI_COMM_WORLD):  - MPI_Send()
> Rank (4, MPI_COMM_WORLD):  - main()
> MPI_Send: process in local group is dead (rank 1, MPI_COMM_WORLD)
> MPI_Send: process in local group is dead (rank 2, MPI_COMM_WORLD)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Send()
> Rank (1, MPI_COMM_WORLD):  - main()
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> Rank (2, MPI_COMM_WORLD):  - MPI_Send()
> Rank (2, MPI_COMM_WORLD):  - main()
> Could someone please explain what I may be doing wrong? 
>  
> Thank you very much for your time,
> Robert
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>


  • text/enriched attachment: stored