hello,
thanx for replying back..
my child process has printf() statements inside it which never get printed. This suggests me that my child is never inited.
I read on the mpi-forum site (http://www.mpi-forum.org/docs/mpi-20-html/node95.htm#Node95), which says
"An implementation may automatically establish communication before MPI_INIT is called by the children. Thus, completion of MPI_COMM_SPAWN in the parent does not necessarily mean that MPI_INIT has been called in the children (although the returned intercommunicator can be used immediately). ( End of advice to users.) "
thus all my operations in the parent group after the merge never complete. I tried using a Barrier after the merge as well, but all the processes block at that barrier, as the spawned process never reaches that barrier.
here is my spawn function run in each process:
MPI_Barrier(MPI_COMM_WORLD); //Wait for everybody in the group to call spawn
MPI_Comm_spawn("clone",MPI_ARGV_NULL,1,0,size-1,MPI_COMM_WORLD,&intercommunicator,&errc
ode); //The root is the last process in the group
if(errcode!=MPI_SUCCESS)
printf("\nSpawn failed");
printf("\n(Rank %d) Size of the intercommunicator before join is %d\n",myrank,size);
MPI_Barrier(MPI_COMM_WORLD); //Wait for everybody in the group to call merge
MPI_Intercomm_merge(intercommunicator,0,&finalcommunicator);
printf("\n(Rank %d) out of merge.Waiting at barrier",myrank); //This statement gets printed
MPI_Barrier(finalcommunicator); //Wait for everybody to start again
my clone program is : (None of the printf's execute)
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_get_parent(&parentinter);
printf("\nHello World.. The clone is born...");
printf("\n(Clone) calling for Intercomm_merge\n");
MPI_Intercomm_merge(parentinter,1,&final);
MPI_Barrier(final); //Barrier for whole group
MPI_Comm_size(final,&psize);
MPI_Comm_rank(final,&newrank);
printf("\n(Clone) My (Rank %d) Parent Communicator size is %d",newrank,psize);
printf("\n(Clone) My (Old Rank %d) new is %d",rank,newrank);
thanx for the help.
----- Original Message -----
From: "Jeff Squyres" <jsquyres_at_[hidden]>
To: General LAM/MPI mailing list <lam_at_[hidden]>
Sent: Friday, December 03, 2004 09:19 PM
Subject: LAM: spawned child not initing
On Dec 2, 2004, at 6:15 PM, mailtome_at_[hidden] wrote:
> i have a master process with an initial group. Now he spawns another
> process using MPI_Comm_Spawn() & then merges it into the current group
> using MPI_Intercomm_merge().
> The operations performed are successful, except for the spawned process
> initializing MPI i.e (child doesnt run MPI_Init()).
I'm not sure what you're saying here -- LAM's MPI_COMM_SPAWN does not
return until the newly-spawned processes have invoked MPI_INIT.
Hence, if MPI_COMM_SPAWN completes, then the children have successfully
started and invoked MPI_INIT (indeed, the establishment of
communication between the parent and children processes is one of the
last things that happens in MPI_INIT).
So I'm not quite sure what you're saying -- you indicate that
COMM_SPAWN and INTERCOMM_MERGE complete (both of which must mean that
the children have spawned and passed MPI_INIT), but then you say that
the children never invoke MPI_INIT. Can you clarify?
And can you indicate how you're sure that the children are not invoking
MPI_INIT?
Also, what version of LAM are you running?
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
:::www.emails.net:::
Is this Spam? If so, report it!
|