|
|
I tried to checkpoint mpi programs but it failed. The details of the problems are
I first installed blcr-0.4.2 with default installation method then Lam-7.0.6 with
./configure --with-rsh="ssh -x" --prefix=/opt/lam-7.0.6
make
make install
The problem I am getting is when I run a single cpu job it runs and I am able to checkpoint it using
cr_checkpoint -v <pid of prog-name>
but when running a multi-processor job and tried to checkpoint it using
mpirun -np 8 --ssi cr blcr [ .. some more options ..] xhpl
cr_checkpoint -v --term <pid of mpirun>
>>
checkpoint request issued
checkpoint request completed
ioctl(/proc/checkpoint/ctrl, CR_OP_CHKPT_REAP): No such process
When tried to checkpoint it using <pid of prog name> the prog simply hangs.
Can some one point me out what is wrong?
Thanks for your time
Anu
---------------------------------
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.
|
|
|