Thanks for highlighting the mistake. The command went through and it did generated three context file. But I am not sure whether checkpoint has happenned successfully as there is problem in restarting the processes. Below are the result of the execution for your reference.

lamhosts <file content>
sys1 cpu=2

$ mpirun -v -np 2 -ssi rpi crtcp -ssi cr_blcr_base_dir . ./TestProg

$ ps -a
  PID TTY          TIME CMD
2589 pts/1    00:00:00 mpirun
2595 pts/2    00:00:00 ps

$ cr_checkpoint 2589

$ls
context.2589
context.2589-n0-2590
context.2589-n0-2591

$ cr_restart context.2589
2701 /home/user/install/blcr-0.5.6/bin/cr_restart running on n0 (o)
4 ::: pid - 2591 From process 1 out of 2, Hello World! from sys1
2704 /home/user/install/blcr-0.5.6/bin/cr_restart running on n0 (o)

I was expecting the printf output from both the process 0 and 1. But received only from process 1. It looks like process 0 didn't restart. Any suggesstion/help.

Also one more clarification what will be process id of the restarted process? Will it be that of old one during the first execution or a new process id?

Thanks,
Gopi

On 8/24/07, Jeff Squyres <jsquyres@cisco.com> wrote:
The module's name is "crtcp", not "crtp".

On Aug 23, 2007, at 6:40 PM, Gopinatha wrote:

> Hi,
>
> I have installed BLCR after installing LAM. When I tried to
> checkpoint it didnt work.
>
> mpirun -ssi rpi crtp -ssi cr blcr -x LD_LIBRARY_PATH ./TestProg
>
> I am getting error : "The rpi module named 'crtp' could not be
> found." on executing above command for job execution.
> I guess I am missing some configuration. Any suggesstion/help
> appreciated.
>
> Regards,
> Gopi
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/


--
Jeff Squyres
Cisco Systems

_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/