>>>Are you able to checkpoint / restart serial processes?
>>
>>Yes. Actually, I have integrated Torque+Maui with blcr as I am working
>>on some fault tolerance research. It would be great if I can checkpoint
>>LAM/MPI jobs as well.
>
>
> What version of LAM/MPI are you using? (I should have asked this in my
> prior mail -- sorry)
>
> There was a problem with BLCR support in 7.1 and 7.1.1 -- we fixed it a
> while ago in the 7.1.2 betas (see http://www.lam-mpi.org/beta/). This
> might well be your problem -- that BLCR support was effectively ignored
> in the MPI processes and therefore you only got the context file for
> mpirun.
Oh, this might be the problem. I am using 7.1.1. I will try the beta
version and let you know.
--
Pradeep Padala
http://ppadala.blogspot.com
|