Hi Jeff & Liu,
Would you take a look at my message and give me some hints?
I downloaded the lam-7.1.2b30 and installed it. But there're 2 problems:
1. If the lamd exited after I invoked "cr_checkpoint --term
${pid_mpirun}", then the subsequent "cr_restart context.{pid_mpirun}" could not
restart the whole program. That is, the program cannot be restarted in a new
LAM session/universe.
2. Even the lamd doesnot exit, if I invoke "cr_checkpoint --term
${pid_mpirun}" multiple times, the "cr_restart" will always restart the
program from the 1st/earliest checkpoint, which means the subsequent
checkpoint doesn't take any effect. Especially, it seems I cannot checkpoint
the restarted application, isn't it?
Would you help me?
Thanks!
Yuan
|