Hello.
Ive attached the code for my manager and slave processes. I've also included
the logging output from a run that should illustrate the problem.
QUICK SUMMARY
There is the master that spawns a child and merges for the intracomm.
>From that intracomm, there are two process: the master (rank0 of 2) and the
slave (rank1 of 2).
Then the master signals the slave (sends a msg with an integer 1) to participate
in a collective spawn/merge. So a second slave comes up. From the intracomm
returned from the merge, there are now three processes: master (rank0 of 3) and
slave1 (rank2 of 3) and slave2 (rank2 of 3)!! Both slaves are saying that they
are 2 of 3.
The problem is that the program then hangs at a barrier. I'm guessing it is
because both slaves are calling themselves the same thing.
I can't seem to understand what is wrong here.
Thanks again for any help.
--dp
Quoting Jeff Squyres <jsquyres_at_[hidden]>:
> Can you send a small code example that shows this problem? That would
> be most helpful.
>
> Thanks!
>
> On Apr 1, 2005, at 9:21 AM, "" <petrovic_at_[hidden]> wrote:
>
> > Hello all.
> >
> > I'm struggling with something that seems to be a familiar topic on
> > this mailing
> > list. Any help would be appreciated.
> >
> > I'm trying to have a 'master' program start up a number of 'slave'
> > programs by a
> > series of spawn calls. (I know I can spawn multiple programs with one
> > call to
> > spawn or spawn_multiple, but for other reasons, i must do it this
> > way...).
> >
> > The general problem is trying to get an intracommunicator that
> > includes the
> > whole bunch. I understand that I can use spawn and intercomm_merge,
> > and that
> > these calls are collective. This seems to work fine except when I run
> > on certain
> > nodes on the cluster I am working on; from some logging, it seems that
> > two
> > processes end up thinking that they are the same rank from a given
> > intracomm.
> >
> >
> > here are the steps:
> >
> > **master**
> > use MPI_COMM_SELF as starting intracomm
> > loop begin
> > (notify existing processes to collectively spawn/merge)
> > spawns a process using intracomm
> > merges the returned intercomm (from the spawn) into intracomm
> > loop end
> >
> >
> > **slave**
> > merges parent intercomm into intracomm.
> > loop begin
> > if notified, spawn (using intracomm)
> > merge (using intercomm returned from spawn) into intracomm
> > loop end
> >
> >
> >
> >
> > also, the master is changing the "lam_spawn_sched_round_robin" key
> > before each
> > spawn, if that might be an issue...
> >
> > Any ideas?
> > Thanks in advance!
> > --dp
> >
> > -----------------------------------------------------------------
> > This mail was sent through IMP Webmail at http://www.imp3.tut.fi/
> > -----------------------------------------------------------------
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
-----------------------------------------------------------------
This mail was sent through IMP Webmail at http://www.imp3.tut.fi/
-----------------------------------------------------------------
|