LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-04-05 08:54:27


Can you send a small code example that shows this problem? That would
be most helpful.

Thanks!

On Apr 1, 2005, at 9:21 AM, "" <petrovic_at_[hidden]> wrote:

> Hello all.
>
> I'm struggling with something that seems to be a familiar topic on
> this mailing
> list. Any help would be appreciated.
>
> I'm trying to have a 'master' program start up a number of 'slave'
> programs by a
> series of spawn calls. (I know I can spawn multiple programs with one
> call to
> spawn or spawn_multiple, but for other reasons, i must do it this
> way...).
>
> The general problem is trying to get an intracommunicator that
> includes the
> whole bunch. I understand that I can use spawn and intercomm_merge,
> and that
> these calls are collective. This seems to work fine except when I run
> on certain
> nodes on the cluster I am working on; from some logging, it seems that
> two
> processes end up thinking that they are the same rank from a given
> intracomm.
>
>
> here are the steps:
>
> **master**
> use MPI_COMM_SELF as starting intracomm
> loop begin
> (notify existing processes to collectively spawn/merge)
> spawns a process using intracomm
> merges the returned intercomm (from the spawn) into intracomm
> loop end
>
>
> **slave**
> merges parent intercomm into intracomm.
> loop begin
> if notified, spawn (using intracomm)
> merge (using intercomm returned from spawn) into intracomm
> loop end
>
>
>
>
> also, the master is changing the "lam_spawn_sched_round_robin" key
> before each
> spawn, if that might be an issue...
>
> Any ideas?
> Thanks in advance!
> --dp
>
> -----------------------------------------------------------------
> This mail was sent through IMP Webmail at http://www.imp3.tut.fi/
> -----------------------------------------------------------------
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/