LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: petrovic_at_[hidden]
Date: 2005-04-07 01:59:14


Hi. Thanks for the response.

Isn't the programname argument significant only at root?
I thought that since those are collective spawns made by the master with all the
existing workers, and since the master is always rank 0 in the comm used with
the spawn, then the programname would only have to be specified at root.
Is this not so?

Thanks again,
--dp

Quoting Prabhanjan Kambadur <pkambadu_at_[hidden]>:

>
> This is a snippet from your copy_myworker program.
>
> ===============================================================
> /* command to spawn & merge */
> if(msg==1){
> char programname[100];
> MPI_Info info;
> int root=0,maxprocs=1;
>
> MPI_Comm_spawn(programname,MPI_ARGV_NULL,maxprocs,info,
> root,intracomm,&intercomm,MPI_ERRCODES_IGNORE );
> fprintf(fp,"spawned...\n");fflush(fp);
> MPI_Intercomm_merge(intercomm,1,&intracomm);
> fprintf(fp,"merged...\n");fflush(fp);
>
> MPI_Comm_rank(intracomm,&mynewrank);
> MPI_Comm_size(intracomm,&mynewsize);
> fprintf(fp,"now im %d of %d\n",mynewrank,mynewsize);fflush(fp);
> }
> =================================================================
>
> Notice that the variable "programname" is never initialize and therefore
> MPI_Comm_spawn throws an exception causing all processes to abort. It
> worked for me without any problems once this was corrected.
>
> Hope this helps,
> Anju
>
>
> On Tue, 5 Apr 2005, wrote:
>
> > Hello.
> >
> > Ive attached the code for my manager and slave processes. I've also
> included
> > the logging output from a run that should illustrate the problem.
> >
> >
> > QUICK SUMMARY
> > There is the master that spawns a child and merges for the intracomm.
> > >From that intracomm, there are two process: the master (rank0 of 2) and
> the
> > slave (rank1 of 2).
> > Then the master signals the slave (sends a msg with an integer 1) to
> participate
> > in a collective spawn/merge. So a second slave comes up. From the
> intracomm
> > returned from the merge, there are now three processes: master (rank0 of 3)
> and
> > slave1 (rank2 of 3) and slave2 (rank2 of 3)!! Both slaves are saying that
> they
> > are 2 of 3.
> >
> > The problem is that the program then hangs at a barrier. I'm guessing it
> is
> > because both slaves are calling themselves the same thing.
> >
> >
> > I can't seem to understand what is wrong here.
> > Thanks again for any help.
> >
> > --dp
> >
> >
> >
> >
> > Quoting Jeff Squyres <jsquyres_at_[hidden]>:
> >
> > > Can you send a small code example that shows this problem? That would
> > > be most helpful.
> > >
> > > Thanks!
> > >
> > > On Apr 1, 2005, at 9:21 AM, "" <petrovic_at_[hidden]> wrote:
> > >
> > > > Hello all.
> > > >
> > > > I'm struggling with something that seems to be a familiar topic on
> > > > this mailing
> > > > list. Any help would be appreciated.
> > > >
> > > > I'm trying to have a 'master' program start up a number of 'slave'
> > > > programs by a
> > > > series of spawn calls. (I know I can spawn multiple programs with one
> > > > call to
> > > > spawn or spawn_multiple, but for other reasons, i must do it this
> > > > way...).
> > > >
> > > > The general problem is trying to get an intracommunicator that
> > > > includes the
> > > > whole bunch. I understand that I can use spawn and intercomm_merge,
> > > > and that
> > > > these calls are collective. This seems to work fine except when I run
> > > > on certain
> > > > nodes on the cluster I am working on; from some logging, it seems that
> > > > two
> > > > processes end up thinking that they are the same rank from a given
> > > > intracomm.
> > > >
> > > >
> > > > here are the steps:
> > > >
> > > > **master**
> > > > use MPI_COMM_SELF as starting intracomm
> > > > loop begin
> > > > (notify existing processes to collectively spawn/merge)
> > > > spawns a process using intracomm
> > > > merges the returned intercomm (from the spawn) into intracomm
> > > > loop end
> > > >
> > > >
> > > > **slave**
> > > > merges parent intercomm into intracomm.
> > > > loop begin
> > > > if notified, spawn (using intracomm)
> > > > merge (using intercomm returned from spawn) into intracomm
> > > > loop end
> > > >
> > > >
> > > >
> > > >
> > > > also, the master is changing the "lam_spawn_sched_round_robin" key
> > > > before each
> > > > spawn, if that might be an issue...
> > > >
> > > > Any ideas?
> > > > Thanks in advance!
> > > > --dp
> > > >
> > > > -----------------------------------------------------------------
> > > > This mail was sent through IMP Webmail at http://www.imp3.tut.fi/
> > > > -----------------------------------------------------------------
> > > > _______________________________________________
> > > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > > >
> > >
> > > --
> > > {+} Jeff Squyres
> > > {+} jsquyres_at_[hidden]
> > > {+} http://www.lam-mpi.org/
> > >
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> >
> >
> >
> >
> > -----------------------------------------------------------------
> > This mail was sent through IMP Webmail at http://www.imp3.tut.fi/
> > -----------------------------------------------------------------
> >
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-----------------------------------------------------------------
This mail was sent through IMP Webmail at http://www.imp3.tut.fi/
-----------------------------------------------------------------