LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Anil K Erukala (anile_at_[hidden])
Date: 2004-10-07 15:15:57


Thank you very much for the suggestion. Now the problem is fixed. As you
said I made a small mistake in my code (//do some work area). The
program used to work perfectly fine for some cases. That is why I didn't
really concentrate much on that area. After you said I found out the
mistake in that area. Thanks for your help.

Anil

-----Original Message-----
From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]] On Behalf
Of Jeff Squyres
Sent: Thursday, October 07, 2004 7:05 AM
To: General LAM/MPI mailing list
Subject: Re: LAM: Problem with collective communication routines

It looks like MPI_COMM_WORLD rank 0 is exiting before it hits the bcast
(or perhaps during the bcast). Are you getting corefiles? Do you know
that (rank == 0) is reading the Bcast statement, and not dying in the
"//do some work" area?

On Oct 7, 2004, at 1:33 AM, Anil K Erukala wrote:

> Hi,
>
> In my program master node reads data (just first line) from 64 files.

> All the files are in one directory. After some computation it
> broadcasts one array size of 64. The total no. of processors is 4. My
> program structure is looks like this.
>
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &nProcs);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>   
> // fileArray is an array of size n
> int fileArray[n];
>
> PM pm( a, b); // Creating an object
>
> if( rank == 0 )
> {
>    // Read first line from files
>
>   // do some work
> }
>
> MPI_Bcast( fileArray, n, MPI_INT, 0, MPI_COMM_WORLD );
> // Here print the contents of the fileArray
>
> MPI_Barrier( MPI_COMM_WORLD );
>
> MPI_Finalize();
>
> When I did this I am getting the following error.
>
> MPI_Recv: process in local group is dead (rank 1, SSI:coll:smp:local
> comm for CID 0)
> MPI_Recv: process in local group is dead (rank 1, comm 3)
> Rank (1, MPI_COMM_WORLD): Call stack within LAM:
> Rank (1, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (1, MPI_COMM_WORLD):  - MPI_Bcast()
> Rank (1, MPI_COMM_WORLD):  - MPI_Bcast()
> Rank (1, MPI_COMM_WORLD):  - main()
> Rank (2, MPI_COMM_WORLD): Call stack within LAM:
> Rank (2, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (2, MPI_COMM_WORLD):  - MPI_Bcast()
> Rank (2, MPI_COMM_WORLD):  - MPI_Bcast()
> Rank (2, MPI_COMM_WORLD):  - main()
> MPI_Recv: process in local group is dead (rank 1, SSI:coll:smp:local
> comm for CID 0)
> Rank (3, MPI_COMM_WORLD): Call stack within LAM:
> Rank (3, MPI_COMM_WORLD):  - MPI_Recv()
> Rank (3, MPI_COMM_WORLD):  - MPI_Bcast()
> Rank (3, MPI_COMM_WORLD):  - MPI_Bcast()
> Rank (3, MPI_COMM_WORLD):  - main()
>
> I think when I use collective communication routines MPI_Bcast and
> MPI_Barrier, one of the process is exiting from the group
> MPI_COMM_WORLD without waiting for the other processes. Please give me

> some suggestions to fix the above problem. Thanks in advance.
>
> Anil
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/