On Wed, 14 May 2003, Vikas Daga wrote:
> /u/dagav 10:31:46>mpirun -v n0-3 ring
> 27097 ring running on n0 (o)
> 15952 ring running on n1
> 21310 ring running on n2
> 7622 ring running on n3
> Enter the number of times around the ring: 5
> Process 0 sending 5 to 1
> MPI_Recv: process in local group is dead (rank 0, MPI_COMM_WORLD)
> Rank 0: Call stack within LAM:
> Rank 0: - MPI_Recv()
> Rank 0: - main()
This clearly shouldn't happen in a proper LAM setup. Are you sure that
you have the same version of LAM/MPI installed on all nodes, and that the
"ring" that is found on all nodes is compiled with the same version of
LAM? Finally, you might also want to check that it works properly on one
node, e.g., "mpirun n0 n0 n0 n0 ring".
You might also want to upgrade to the latest LAM/MPI if you can -- 6.5.9.
It's been so long since the 6.3 series that I don't even remember all the
"known issues" that we've fixed since then. :-)
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|