LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-08-11 09:54:30


On Aug 10, 2005, at 4:03 PM, Jim Lasc wrote:

> Now I want to add nodes.
> With adding a node I mean the following:
> connecting a computer which is unknown at the time of startup (one I
> just bought, for example)  to the ring, and allowing him (the new
> node) to speak
> with his neighbour-nodes.
>
> (1)How should I implement that (see below...)?
> (2)when I use MPI_Spawn, I can't "say" that it has to be spawned on
> the new node, because MPI decides itself where to spawn,is this
> correct?

Each MPI implementations' inner workings of MPI_COMM_SPAWN are likely
to be different. LAM allows some degree of placement of processes on
nodes -- see LAM's MPI_Comm_spawn(3) man page.

> (3)So I should use MPI_Open_port on a "master-node" and connect the
> new node with the master-node, correct?

That is one way to do it, yes.

> And, MPI_comm_accept is blocking, so if I want the new node to be
> able to connect on every moment,
> (4)I should use a thread solemny for the MPI_Comm_accept, is this
> correct?

That is also a common way to do it. However, be aware that your MPI
implementation must be thread safe to do this. LAM/MPI is not. We
demonstrated exactly this, however, last year at SC with Open MPI.
Open MPI, unfortunately, is not yet available to the public. :-\

> (5) when I use MPI_Intercomm_merge, is there a way to say that I want
> the nodes 0-n to keep their rank, and that I want the new node to have
> rank n+1 ?
> because (see above) I posted a lot of IRecv's at the startup-phase
> (and, the IRecv's are reposted once they are filled),
> so I prefer only having to change the IRecv's from node 0 and n
> instead of all the IRecv's
>  (and, this gives less problems for messages which are between sender
> and receiver)

Keep in mind that when you INTERCOMM_MERGE, you're not adding processes
to an existing communicator -- you're getting an entirely new
communicator. So using this scheme, you'll have to ditch all your
prior requests and re-post them. This will likely be a fairly complex
scheme, because there's race conditions that may occur (depending on
your implementation) of messages that were sent on the old communicator
but not received, etc.

Another option is to use pair-wise communicators, and just use an array
of requests that you WAITANY (or somesuch) on. Then the communicator
that each process is located in doesn't matter too much (except during
startup, addition, deletion, and shutdown) during normal operations of
sending and receiving.

> (6) When a nodenumber changes, and a message is between sender and
> receiver, I can consider the message as lost, correct?
> When finished I want it to be totally decentralised, so that the new
> node can connect with a node of his choise.

To clarify: I don't know what you mean by "nodenumber" -- there is no
such thing. Every MPI process has a unique process rank in each
communicator that it is in. So if a process is in multiple
communicators (and, by definition, they are each in at least
MPI_COMM_WORLD and MPI_COMM_SELF), then they may have a different rank
in each communicator.

> (7)This means I should open a port on every node from the
> "start-group", correct?

I'm not sure what you mean here...?

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/