LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Junaid Ali (junaid_at_[hidden])
Date: 2004-12-04 16:39:48


hello,

thanx for replying back..

my child process has printf() statements inside it which never get printed. This suggests me that my child is never inited.
I read on the mpi-forum site (http://www.mpi-forum.org/docs/mpi-20-html/node95.htm#Node95), which says

"An implementation may automatically establish communication before MPI_INIT is called by the children. Thus, completion of MPI_COMM_SPAWN in the parent does not necessarily mean that MPI_INIT has been called in the children (although the returned intercommunicator can be used immediately). ( End of advice to users.) "

thus all my operations in the parent group after the merge never complete. I tried using a Barrier after the merge as well, but all the processes block at that barrier, as the spawned process never reaches that barrier.

here is my spawn function run in each process:

MPI_Barrier(MPI_COMM_WORLD); //Wait for everybody in the group to call spawn
         MPI_Comm_spawn("clone",MPI_ARGV_NULL,1,0,size-1,MPI_COMM_WORLD,&intercommunicator,&errc
ode); //The root is the last process in the group

                if(errcode!=MPI_SUCCESS)
                        printf("\nSpawn failed");

printf("\n(Rank %d) Size of the intercommunicator before join is %d\n",myrank,size);

MPI_Barrier(MPI_COMM_WORLD); //Wait for everybody in the group to call merge

MPI_Intercomm_merge(intercommunicator,0,&finalcommunicator);

printf("\n(Rank %d) out of merge.Waiting at barrier",myrank); //This statement gets printed

                MPI_Barrier(finalcommunicator); //Wait for everybody to start again

my clone program is : (None of the printf's execute)

MPI_Init(&argc,&argv);

MPI_Comm_rank(MPI_COMM_WORLD,&rank);
MPI_Comm_size(MPI_COMM_WORLD,&size);
MPI_Comm_get_parent(&parentinter);

printf("\nHello World.. The clone is born...");
printf("\n(Clone) calling for Intercomm_merge\n");

MPI_Intercomm_merge(parentinter,1,&final);

MPI_Barrier(final); //Barrier for whole group

MPI_Comm_size(final,&psize);
MPI_Comm_rank(final,&newrank);
printf("\n(Clone) My (Rank %d) Parent Communicator size is %d",newrank,psize);
printf("\n(Clone) My (Old Rank %d) new is %d",rank,newrank);

thanx for the help.

----- Original Message -----
From: "Jeff Squyres" <jsquyres_at_[hidden]>
To: General LAM/MPI mailing list <lam_at_[hidden]>
Sent: Friday, December 03, 2004 09:19 PM
Subject: LAM: spawned child not initing

 On Dec 2, 2004, at 6:15 PM, mailtome_at_[hidden] wrote:
 
> i have a master process with an initial group. Now he spawns another
> process using MPI_Comm_Spawn() & then merges it into the current group
> using MPI_Intercomm_merge().
> The operations performed are successful, except for the spawned process
> initializing MPI i.e (child doesnt run MPI_Init()).
 
 I'm not sure what you're saying here -- LAM's MPI_COMM_SPAWN does not
 return until the newly-spawned processes have invoked MPI_INIT.
 
 Hence, if MPI_COMM_SPAWN completes, then the children have successfully
 started and invoked MPI_INIT (indeed, the establishment of
 communication between the parent and children processes is one of the
 last things that happens in MPI_INIT).
 
 So I'm not quite sure what you're saying -- you indicate that
 COMM_SPAWN and INTERCOMM_MERGE complete (both of which must mean that
 the children have spawned and passed MPI_INIT), but then you say that
 the children never invoke MPI_INIT. Can you clarify?
 
 And can you indicate how you're sure that the children are not invoking
 MPI_INIT?
 
 Also, what version of LAM are you running?
 
 --
 {+} Jeff Squyres
 {+} jsquyres_at_[hidden]
 {+} http://www.lam-mpi.org/
 
 _______________________________________________
 This list is archived at http://www.lam-mpi.org/MailArchives/lam/

:::www.emails.net:::
Is this Spam? If so, report it!