LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-10-11 08:50:19


These error messages mean that processes 2-7 tried to do a receive from
someone who they later found out were dead, so they aborted.

Check around to see if you got any corefiles from the other two processes (0
and 1); perhaps they seg faulted or otherwise died leaving some kind of clue
as to why.

On 10/11/06 4:37 AM, "Jeffrey B. Layton" <laytonjb_at_[hidden]> wrote:

> Good morning,
>
> I got LAM working (turns out I was missing some files in
> the installation - sorry about the bother on that one). I built
> the code with PGI-6.1 and LAM-7.1.2 and tried running the
> code. I get past MPI_INIT and then the code hangs. I get
> the following error messages from the code:
>
>
> 9075 ./cfl3d_pgi6.1_lam-7.1.2_mpi running on n0 (o)
> 9076 ./cfl3d_pgi6.1_lam-7.1.2_mpi running on n0 (o)
> 9077 ./cfl3d_pgi6.1_lam-7.1.2_mpi running on n0 (o)
> 9078 ./cfl3d_pgi6.1_lam-7.1.2_mpi running on n0 (o)
> 23787 ./cfl3d_pgi6.1_lam-7.1.2_mpi running on n1
> 23788 ./cfl3d_pgi6.1_lam-7.1.2_mpi running on n1
> 23789 ./cfl3d_pgi6.1_lam-7.1.2_mpi running on n1
> 23790 ./cfl3d_pgi6.1_lam-7.1.2_mpi running on n1
> MPI_Recv: process in local group is dead (rank 4, MPI_COMM_WORLD)
> MPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
> MPI_Recv: process in local group is dead (rank 3, MPI_COMM_WORLD)
> MPI_Recv: process in local group is dead (rank 5, MPI_COMM_WORLD)
> MPI_Recv: process in local group is dead (rank 6, MPI_COMM_WORLD)
> MPI_Recv: process in local group is dead (rank 7, MPI_COMM_WORLD)
> Rank (4, MPI_COMM_WORLD): Call stack within LAM:
> Rank (4, MPI_COMM_WORLD): - MPI_Recv()
> Rank (4, MPI_COMM_WORLD): - main()
> Rank (3, MPI_COMM_WORLD): Call stack within LAM:
> Rank (3, MPI_COMM_WORLD): - MPI_Recv()
> Rank (3, MPI_COMM_WORLD): - main()
> Rank (6, MPI_COMM_WORLD): Call stack within LAM:
> Rank (6, MPI_COMM_WORLD): - MPI_Recv()
> Rank (6, MPI_COMM_WORLD): - main()
> Rank (5, MPI_COMM_WORLD): Call stack within LAM:
> Rank (5, MPI_COMM_WORLD): - MPI_Recv()
> Rank (5, MPI_COMM_WORLD): - main()
> Rank (7, MPI_COMM_WORLD): Call stack within LAM:
> Rank (7, MPI_COMM_WORLD): - MPI_Recv()
> Rank (7, MPI_COMM_WORLD): - main()
>
>
> This is an 8 process run and it looks like process 0 and 1 did something
> naughty. Any ideas?
>
> Thanks!
>
> Jeff
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Server Virtualization Business Unit
Cisco Systems