It looks like MPI_COMM_WORLD rank 0 is exiting before it hits the bcast
(or perhaps during the bcast). Are you getting corefiles? Do you know
that (rank == 0) is reading the Bcast statement, and not dying in the
"//do some work" area?
On Oct 7, 2004, at 1:33 AM, Anil K Erukala wrote:
> Hi,
>
> In my program master node reads data (just first line) from 64 files.
> All the files are in one directory. After some computation it
> broadcasts one array size of 64. The total no. of processors is 4. My
> program structure is looks like this.
>
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &nProcs);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
> // fileArray is an array of size n
> int fileArray[n];
>
> PM pm( a, b); // Creating an object
>
> if( rank == 0 )
> {
> // Read first line from files
>
> // do some work
> }
>
> MPI_Bcast( fileArray, n, MPI_INT, 0, MPI_COMM_WORLD );
> // Here print the contents of the fileArray
>
> MPI_Barrier( MPI_COMM_WORLD );
>
> MPI_Finalize();
>
> When I did this I am getting the following error.
>
> MPI_Recv: process in local group is dead (rank 1, SSI:coll:smp:local
> comm for CID 0)
> MPI_Recv: process in local group is dead (rank 1, comm 3)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD): - MPI_Recv()
> Rank (1, MPI_COMM_WORLD): - MPI_Bcast()
> Rank (1, MPI_COMM_WORLD): - MPI_Bcast()
> Rank (1, MPI_COMM_WORLD): - main()
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> Rank (2, MPI_COMM_WORLD): - MPI_Recv()
> Rank (2, MPI_COMM_WORLD): - MPI_Bcast()
> Rank (2, MPI_COMM_WORLD): - MPI_Bcast()
> Rank (2, MPI_COMM_WORLD): - main()
> MPI_Recv: process in local group is dead (rank 1, SSI:coll:smp:local
> comm for CID 0)
> Rank (3, MPI_COMM_WORLD): Call stack within LAM:
> Rank (3, MPI_COMM_WORLD): - MPI_Recv()
> Rank (3, MPI_COMM_WORLD): - MPI_Bcast()
> Rank (3, MPI_COMM_WORLD): - MPI_Bcast()
> Rank (3, MPI_COMM_WORLD): - main()
>
> I think when I use collective communication routines MPI_Bcast and
> MPI_Barrier, one of the process is exiting from the group
> MPI_COMM_WORLD without waiting for the other processes. Please give me
> some suggestions to fix the above problem. Thanks in advance.
>
> Anil
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|