LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-08-12 07:27:57


On Aug 12, 2005, at 6:38 AM, Jim Lasc wrote:

> >  (6) When a nodenumber changes, and a message is between sender and
> > receiver, I can consider the message as lost, correct?
> >  When finished I want it to be totally decentralised, so that the new
> > node can connect with a node of his choise.
>
> To clarify: I don't know what you mean by "nodenumber" -- there is no
> such thing.  Every MPI process has a unique process rank in each
> communicator that it is in.  So if a process is in multiple
> communicators (and, by definition, they are each in at least
> MPI_COMM_WORLD and MPI_COMM_SELF), then they may have a different rank
> in each communicator.
> ->by nodenumber I mean MPI rank in the communicator which contains ALL
> the processes/nodes (1 proces/node)

Ok.

MPI message delivery is guaranteed (unless the source or destination
process dies *and* the MPI implementation is capable of handling such
faults without aborting). Take the following example:

- assume an MPI implementation that can handle process faults
- a communicator contains 3 processes: A, B, C
- A sends a message to C
- C has not received the message yet
- B dies
- C can (and should) still eventually receive the message

So no, messages are not lost.

> >  (7)This means I should open a port on every node from the
> > "start-group", correct?
>
> I'm not sure what you mean here...?
> -> the start group is the nodes from COM_WORLD (without any nodes
> added...)

Good. Let us know what you come up with as a final solution; there are
others trying to tackle similar problems.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/