LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Junaid Ali (junaid_at_[hidden])
Date: 2004-12-05 00:52:46


hello

thanx for the reply...

indeed my spawned process is getting inited ( i checked it by writing to the file, and it works..) but i am not being able to send messages to or receive message from the spawned process as the program crashes while doing so.

any ideas
thanx
Junaid

----- Original Message -----
From: "Jeff Squyres" <jsquyres_at_[hidden]>
To: General LAM/MPI mailing list <lam_at_[hidden]>
Sent: Saturday, December 04, 2004 03:06 PM
Subject: LAM: spawned child not initing

 On Dec 4, 2004, at 4:39 PM, Junaid Ali wrote:
 
> my child process has printf() statements inside it which never get
> printed. This suggests me that my child is never inited.
 
 Actually, it doesn't. Standard output in a parallel environment is a
 very finnicky thing -- there's buffering at potentially multiple
 levels. So just because you may do a printf() doesn't mean that it
 will appear right away (or at all). A better test would be to write to
 a local file that you can check after the fact.
 
> I read on the mpi-forum site
> (http://www.mpi-forum.org/docs/mpi-20-html/node95.htm#Node95), which
> says
>
> "An implementation may automatically establish communication before
> MPI_INIT is called by the children. Thus, completion of MPI_COMM_SPAWN
> in the parent does not necessarily mean that MPI_INIT has been called
> in the children (although the returned intercommunicator can be used
> immediately). ( End of advice to users.) "
 
 As I indicated in my last mail, in LAM, completion of MPI_COMM_SPAWN
 *does* mean that MPI_INIT completed in the newly-spawned processes.
 
> thus all my operations in the parent group after the merge never
> complete. I tried using a Barrier after the merge as well, but all the
> processes block at that barrier, as the spawned process never reaches
> that barrier.
>
> here is my spawn function run in each process:
>
> MPI_Barrier(MPI_COMM_WORLD); //Wait for everybody in the group to
> call spawn
>
> MPI_Comm_spawn("clone",MPI_ARGV_NULL,1,0,size
> -1,MPI_COMM_WORLD,&intercommunicator,&errc
> ode); //The root is the last process in the group
 
 Using "0" for the info argument is not portable; you should really use
 MPI_INFO_NULL if you have no info keys to pass.
 
> if(errcode!=MPI_SUCCESS)
> printf("\nSpawn failed");
>
> printf("\n(Rank %d) Size of the intercommunicator before join is
> %d\n",myrank,size);
 
 I'm assuming that size is actually the size of MPI_COMM_WORLD, not the
 intercommunicator (i.e., I don't see a new call to MPI_Comm_size with
 intercommunicator as the argument).
 
> MPI_Barrier(MPI_COMM_WORLD); //Wait for everybody in the group to
> call merge
>
> MPI_Intercomm_merge(intercommunicator,0,&finalcommunicator);
 
 Note that Intercomm_merge is a collective call, so it won't complete
 until everyone has invoked it.
 
> printf("\n(Rank %d) out of merge.Waiting at barrier",myrank); //This
> statement gets printed
>
> MPI_Barrier(finalcommunicator); //Wait for everybody
> to start again
>
>
> my clone program is : (None of the printf's execute)
>
> MPI_Init(&argc,&argv);
>
> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
> MPI_Comm_size(MPI_COMM_WORLD,&size);
> MPI_Comm_get_parent(&parentinter);
>
> printf("\nHello World.. The clone is born...");
> printf("\n(Clone) calling for Intercomm_merge\n");
>
> MPI_Intercomm_merge(parentinter,1,&final);
>
> MPI_Barrier(final); //Barrier for whole group
>
> MPI_Comm_size(final,&psize);
> MPI_Comm_rank(final,&newrank);
> printf("\n(Clone) My (Rank %d) Parent Communicator size is
> %d",newrank,psize);
> printf("\n(Clone) My (Old Rank %d) new is %d",rank,newrank);
 
 I don't see anything obviously wrong here, but I think you need to
 clarify exactly where the set of processes is hanging. It's *not* at
 the client's MPI_INIT.
 
 --
 {+} Jeff Squyres
 {+} jsquyres_at_[hidden]
 {+} http://www.lam-mpi.org/
 
 _______________________________________________
 This list is archived at http://www.lam-mpi.org/MailArchives/lam/

:::www.emails.net:::
Is this Spam? If so, report it!