Heiko Bauke wrote:
> Dear all,
>
> I'm trying to use LAM/MPI 7.1.1 with Berkeley Lab Checkpoint/Restart
> 0.4.0 and kernel 2.4.26. But I don't get things working correctly. Is
> anybody using BLCR to checkpoint MPI applications?
>
> I can checkpoint and restart sequential programs and programs that use
> POSIX threads without problems. So, BLCR seams to work.
>
> As described in the User's Guide, I start my MPI programs with
>
> $ mpirun -np 3 -ssi rpi crtcp -ssi cr blcr checkpoint_mpi
>
> When I call cr_checkpoint with the pid of mpirun only a single
> checkpoint file with the context of mpirun is saved. But I cannot find
> any context files of my applications. I also tried to linked my
> application directly to libcr.so, but this did not help.
>
> Has anybody an idea, what I could had made wrong?
Try to checkpoint the first process started by mpirun instead of mpirun.
I think this will work. (I've tried this a few months ago, and this has
worked).
Regards,
Zoltan
>
>
> Heiko
>
>
> P.S.: output of laminfo is:
>
> bauke_at_hal:~ $ laminfo
> LAM/MPI: 7.1.1
> Prefix: /usr
> Architecture: i686-pc-linux-gnu
> Configured by: root
> Configured on: Thu Mar 17 18:09:34 CET 2005
> Configure host: hal
> Memory manager: ptmalloc2
> C bindings: yes
> C++ bindings: yes
> Fortran bindings: yes
> C compiler: gcc
> C++ compiler: g++
> Fortran compiler: g77
> Fortran symbols: double_underscore
> C profiling: yes
> C++ profiling: yes
> Fortran profiling: yes
> C++ exceptions: no
> Thread support: yes
> ROMIO support: yes
> IMPI support: no
> Debug support: no
> Purify clean: no
> SSI boot: globus (API v1.1, Module v0.6)
> SSI boot: rsh (API v1.1, Module v1.1)
> SSI boot: slurm (API v1.1, Module v1.0)
> SSI coll: lam_basic (API v1.1, Module v7.1)
> SSI coll: shmem (API v1.1, Module v1.0)
> SSI coll: smp (API v1.1, Module v1.2)
> SSI rpi: crtcp (API v1.1, Module v1.1)
> SSI rpi: lamd (API v1.0, Module v7.1)
> SSI rpi: sysv (API v1.0, Module v7.1)
> SSI rpi: tcp (API v1.0, Module v7.1)
> SSI rpi: usysv (API v1.0, Module v7.1)
> SSI cr: blcr (API v1.0, Module v1.1)
> SSI cr: self (API v1.0, Module v1.0)
>
|