LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Valter Toffolo (toffolo_at_[hidden])
Date: 2003-08-05 12:22:19


We have a COW in which machines are user workstations, and it enters in the cluster mode when the machine is idle. This way, the number of machines in the cluster will be constantly changing and lamgrow/lamshrink will need to be run every time a workstation changes its state (workstation/node).
What we need is an environment where the MPI application can run on available nodes, shrinking the environment when a node needs to go back to workstation mode, and growing it when a workstation is idle and goes to node mode.
Fault tolerance is already implemented, losing a node is not a matter, shrinking and growing is not the issue. The application (it's actually a prototype application we'll use in order to make a framework) uses MPI_Comm_spawn to spawn a process in each available node when the program begin. As we get new nodes, lamgrow would be called and a process spawned in the new node.
The problem is when a workstation goes to node mode, and it lamgrows the environment. I need a way to know the LAM machine has grown at application level. I've tried using MPI_Attr_get to get the universe size every once in a while, but as long as the application is running, MPI_Attr_get will return the same universe size it returned the first time I called MPI_Attr_get, even though lamshrink or lamgrow is run. So the first issue is 'how will my application know LAM machine has grown?'.
As it knows the environment has grown, I will need to spawn another process on the new node. I could use MPI_Info_set with "lam_spawn_sched_round_robin", set the initial node and run one process there. However, I'm looking for a portable way to do it, if there is any. Anyway, I will need a mechanism so the application can know which number is the new node the environment got, so it can spawn a process there.

Regards, Toffolo.