Hello,
I am trying to use the ckecpoint feature of lam.
There is no problem when I try to checkpoint/restart a lam job. The job
seems to be correcly checkpointed :
mpirun -v -ssi rpi crtcp -ssi cr blcr -np 1 myjob
cr_checkpoint `pgrep mpirun`
(no output)
(context.PID created)
But I have the following error when I try to restart it :
cr_restart context.PID
mpirun: Bad file descriptor
Does anybody encountered this problem ?
Are there more documents on checkpointing than the "User's Guide" ?
Sébastien
--
Sébastien Georget
INRIA Sophia-Antipolis, Service DREAM, B.P. 93
06902 Sophia-Antipolis Cedex, FRANCE
E-mail : sebastien.georget_at_[hidden]
|