LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Prabhanjan Kambadur (pkambadu_at_[hidden])
Date: 2005-04-07 00:46:57


This is a snippet from your copy_myworker program.

===============================================================
     /* command to spawn & merge */
     if(msg==1){
       char programname[100];
       MPI_Info info;
       int root=0,maxprocs=1;

       MPI_Comm_spawn(programname,MPI_ARGV_NULL,maxprocs,info,
           root,intracomm,&intercomm,MPI_ERRCODES_IGNORE );
       fprintf(fp,"spawned...\n");fflush(fp);
       MPI_Intercomm_merge(intercomm,1,&intracomm);
       fprintf(fp,"merged...\n");fflush(fp);

       MPI_Comm_rank(intracomm,&mynewrank);
       MPI_Comm_size(intracomm,&mynewsize);
       fprintf(fp,"now im %d of %d\n",mynewrank,mynewsize);fflush(fp);
     }
=================================================================

Notice that the variable "programname" is never initialize and therefore
MPI_Comm_spawn throws an exception causing all processes to abort. It
worked for me without any problems once this was corrected.

Hope this helps,
Anju

On Tue, 5 Apr 2005, wrote:

> Hello.
>
> Ive attached the code for my manager and slave processes. I've also included
> the logging output from a run that should illustrate the problem.
>
>
> QUICK SUMMARY
> There is the master that spawns a child and merges for the intracomm.
> >From that intracomm, there are two process: the master (rank0 of 2) and the
> slave (rank1 of 2).
> Then the master signals the slave (sends a msg with an integer 1) to participate
> in a collective spawn/merge. So a second slave comes up. From the intracomm
> returned from the merge, there are now three processes: master (rank0 of 3) and
> slave1 (rank2 of 3) and slave2 (rank2 of 3)!! Both slaves are saying that they
> are 2 of 3.
>
> The problem is that the program then hangs at a barrier. I'm guessing it is
> because both slaves are calling themselves the same thing.
>
>
> I can't seem to understand what is wrong here.
> Thanks again for any help.
>
> --dp
>
>
>
>
> Quoting Jeff Squyres <jsquyres_at_[hidden]>:
>
> > Can you send a small code example that shows this problem? That would
> > be most helpful.
> >
> > Thanks!
> >
> > On Apr 1, 2005, at 9:21 AM, "" <petrovic_at_[hidden]> wrote:
> >
> > > Hello all.
> > >
> > > I'm struggling with something that seems to be a familiar topic on
> > > this mailing
> > > list. Any help would be appreciated.
> > >
> > > I'm trying to have a 'master' program start up a number of 'slave'
> > > programs by a
> > > series of spawn calls. (I know I can spawn multiple programs with one
> > > call to
> > > spawn or spawn_multiple, but for other reasons, i must do it this
> > > way...).
> > >
> > > The general problem is trying to get an intracommunicator that
> > > includes the
> > > whole bunch. I understand that I can use spawn and intercomm_merge,
> > > and that
> > > these calls are collective. This seems to work fine except when I run
> > > on certain
> > > nodes on the cluster I am working on; from some logging, it seems that
> > > two
> > > processes end up thinking that they are the same rank from a given
> > > intracomm.
> > >
> > >
> > > here are the steps:
> > >
> > > **master**
> > > use MPI_COMM_SELF as starting intracomm
> > > loop begin
> > > (notify existing processes to collectively spawn/merge)
> > > spawns a process using intracomm
> > > merges the returned intercomm (from the spawn) into intracomm
> > > loop end
> > >
> > >
> > > **slave**
> > > merges parent intercomm into intracomm.
> > > loop begin
> > > if notified, spawn (using intracomm)
> > > merge (using intercomm returned from spawn) into intracomm
> > > loop end
> > >
> > >
> > >
> > >
> > > also, the master is changing the "lam_spawn_sched_round_robin" key
> > > before each
> > > spawn, if that might be an issue...
> > >
> > > Any ideas?
> > > Thanks in advance!
> > > --dp
> > >
> > > -----------------------------------------------------------------
> > > This mail was sent through IMP Webmail at http://www.imp3.tut.fi/
> > > -----------------------------------------------------------------
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> >
> > --
> > {+} Jeff Squyres
> > {+} jsquyres_at_[hidden]
> > {+} http://www.lam-mpi.org/
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
>
>
>
> -----------------------------------------------------------------
> This mail was sent through IMP Webmail at http://www.imp3.tut.fi/
> -----------------------------------------------------------------
>