I am working on creating a fault tolerance case. I started with the lam/examples/fault, although a recent posting has thrown some doubt on this.
I would like to have a MASTER (M) , that spawns, say, 3 processes, A, B and C. I also want all the processors to run on different nodes, so 4 nodes in this case.
I believe MPI_COMM_SPAWN creates 3 seperate intercommunicators for A, B and C, such that I will have M->A ; M->B and M->C. That should be fine assuming SPAWN will allow going to different nodes - will it ?
Now I would like to create an intercommunicator for each of C -> A and C->B.
I looked at MPI_INTERCOMM_MERGE, but it talks of merging INTRA communicators to create an INTER communicator, but A, B and C are all INTER-comm ? So I looked at maybe forming some "groups", but its not clear what "group" A,B and C belong to, since all seem to have their own MPI_WORLD_COMM.?
Can anyone point me to an example or explain how this is done.
Thanks, Liberty
|