Hello,
I intend to use dynamic resource to execute MPI-2 applications, but I
found some problems associating the 'lamgrow' and 'MPI_Comm_spawn'
primitives.
For instance, I start a test application using
'lamboot -v nodefile' and after
'mpirun -np 1 test'
the 'test' program will spawn children processes into nodes available
using MPI_Comm_spawn. Until this point, the application run exactly as
planned, i.e., there are processes running in all nodes of lam
environment.
During the application execution, one new node is added to lam
environment using 'lamgrow -v xx'. Its output confirm the addition and
the update of lam environment. I also run the 'lamnodes' that was
show the lam environment with the new node.
In 'test' program, there is an 'if' to verify changes in the lam
environment and if there are it spawn new processes intend to use the
new resource.
Something like:
if(nbNodes <getntype(0, 0x02) ){
nbNodes = getntype(0, 0x02);
MPI_Comm_spawn("child", MPI_ARGV_NULL, nbNodes, local_info, 0,
MPI_COMM_SELF, &comm, errcodes);
...
}
The problem is, after the MPI_Comm_spawn the processes was created but
just in nodes that was participate of the lamboot, and any process is
spawned in the node added by lamgrow.
One solution that I found was spawn a new test program before spawn
the children, but this procedure bring a performance overhead and can
not be applied to all kind of applications without use a checkpoint
mechanism.
It is possible spawn processes into resource added at runtime without
make a kind of restart in the application?
Thank you!
Márcia C. Cera
|