On Dec 4, 2004, at 4:39 PM, Junaid Ali wrote:
> my child process has printf() statements inside it which never get
> printed. This suggests me that my child is never inited.
Actually, it doesn't. Standard output in a parallel environment is a
very finnicky thing -- there's buffering at potentially multiple
levels. So just because you may do a printf() doesn't mean that it
will appear right away (or at all). A better test would be to write to
a local file that you can check after the fact.
> I read on the mpi-forum site
> (http://www.mpi-forum.org/docs/mpi-20-html/node95.htm#Node95), which
> says
>
> "An implementation may automatically establish communication before
> MPI_INIT is called by the children. Thus, completion of MPI_COMM_SPAWN
> in the parent does not necessarily mean that MPI_INIT has been called
> in the children (although the returned intercommunicator can be used
> immediately). ( End of advice to users.) "
As I indicated in my last mail, in LAM, completion of MPI_COMM_SPAWN
*does* mean that MPI_INIT completed in the newly-spawned processes.
> thus all my operations in the parent group after the merge never
> complete. I tried using a Barrier after the merge as well, but all the
> processes block at that barrier, as the spawned process never reaches
> that barrier.
>
> here is my spawn function run in each process:
>
> MPI_Barrier(MPI_COMM_WORLD); //Wait for everybody in the group to
> call spawn
>
> MPI_Comm_spawn("clone",MPI_ARGV_NULL,1,0,size
> -1,MPI_COMM_WORLD,&intercommunicator,&errc
> ode); //The root is the last process in the group
Using "0" for the info argument is not portable; you should really use
MPI_INFO_NULL if you have no info keys to pass.
> if(errcode!=MPI_SUCCESS)
> printf("\nSpawn failed");
>
> printf("\n(Rank %d) Size of the intercommunicator before join is
> %d\n",myrank,size);
I'm assuming that size is actually the size of MPI_COMM_WORLD, not the
intercommunicator (i.e., I don't see a new call to MPI_Comm_size with
intercommunicator as the argument).
> MPI_Barrier(MPI_COMM_WORLD); //Wait for everybody in the group to
> call merge
>
> MPI_Intercomm_merge(intercommunicator,0,&finalcommunicator);
Note that Intercomm_merge is a collective call, so it won't complete
until everyone has invoked it.
> printf("\n(Rank %d) out of merge.Waiting at barrier",myrank); //This
> statement gets printed
>
> MPI_Barrier(finalcommunicator); //Wait for everybody
> to start again
>
>
> my clone program is : (None of the printf's execute)
>
> MPI_Init(&argc,&argv);
>
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> MPI_Comm_get_parent(&parentinter);
>
> printf("\nHello World.. The clone is born...");
> printf("\n(Clone) calling for Intercomm_merge\n");
>
> MPI_Intercomm_merge(parentinter,1,&final);
>
> MPI_Barrier(final); //Barrier for whole group
>
> MPI_Comm_size(final,&psize);
> MPI_Comm_rank(final,&newrank);
> printf("\n(Clone) My (Rank %d) Parent Communicator size is
> %d",newrank,psize);
> printf("\n(Clone) My (Old Rank %d) new is %d",rank,newrank);
I don't see anything obviously wrong here, but I think you need to
clarify exactly where the set of processes is hanging. It's *not* at
the client's MPI_INIT.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|