$ mpirun n0 -ssi cr blcr ./hello
if the process' pid is 6860, then
$ lamcheckpoint -ssi cr blcr -pid 6860
$ lamrestart -ssi cr blcr -ssi cr_blcr_context_file context.mpirun.6860
it does work fine!
however,
if i want to use that in multi nodes,
$ mpirun n0-1 -ssi cr blcr ./hello
if the process' pid in this node is 6860, then
$ lamcheckpoint -ssi cr blcr -pid 6860
this is always waiting...
and the context file will never be created
so, how to use lamcheckpoint and lamrestart in multi-node?
thanks in advance
Lenjoy
---------------------------------
Yahoo! Messenger with Voice. PC-to-Phone calls for ridiculously low rates.
|