Thanks for highlighting the mistake. The command went through and it did
generated three context file. But I am not sure whether checkpoint has
happenned successfully as there is problem in restarting the processes.
Below are the result of the execution for your reference.
lamhosts <file content>
sys1 cpu=2
$ mpirun -v -np 2 -ssi rpi crtcp -ssi cr_blcr_base_dir . ./TestProg
$ ps -a
PID TTY TIME CMD
2589 pts/1 00:00:00 mpirun
2595 pts/2 00:00:00 ps
$ cr_checkpoint 2589
$ls
context.2589
context.2589-n0-2590
context.2589-n0-2591
$ cr_restart context.2589
2701 /home/user/install/blcr-0.5.6/bin/cr_restart running on n0 (o)
4 ::: pid - 2591 From process 1 out of 2, Hello World! from sys1
2704 /home/user/install/blcr-0.5.6/bin/cr_restart running on n0 (o)
I was expecting the printf output from both the process 0 and 1. But
received only from process 1. It looks like process 0 didn't restart. Any
suggesstion/help.
Also one more clarification what will be process id of the restarted
process? Will it be that of old one during the first execution or a new
process id?
Thanks,
Gopi
On 8/24/07, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>
> The module's name is "crtcp", not "crtp".
>
> On Aug 23, 2007, at 6:40 PM, Gopinatha wrote:
>
> > Hi,
> >
> > I have installed BLCR after installing LAM. When I tried to
> > checkpoint it didnt work.
> >
> > mpirun -ssi rpi crtp -ssi cr blcr -x LD_LIBRARY_PATH ./TestProg
> >
> > I am getting error : "The rpi module named 'crtp' could not be
> > found." on executing above command for job execution.
> > I guess I am missing some configuration. Any suggesstion/help
> > appreciated.
> >
> > Regards,
> > Gopi
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|