Dear all,
I'm trying to use LAM/MPI 7.1.1 with Berkeley Lab Checkpoint/Restart
0.4.0 and kernel 2.4.26. But I don't get things working correctly. Is
anybody using BLCR to checkpoint MPI applications?
I can checkpoint and restart sequential programs and programs that use
POSIX threads without problems. So, BLCR seams to work.
As described in the User's Guide, I start my MPI programs with
$ mpirun -np 3 -ssi rpi crtcp -ssi cr blcr checkpoint_mpi
When I call cr_checkpoint with the pid of mpirun only a single
checkpoint file with the context of mpirun is saved. But I cannot find
any context files of my applications. I also tried to linked my
application directly to libcr.so, but this did not help.
Has anybody an idea, what I could had made wrong?
Heiko
P.S.: output of laminfo is:
bauke_at_hal:~ $ laminfo
LAM/MPI: 7.1.1
Prefix: /usr
Architecture: i686-pc-linux-gnu
Configured by: root
Configured on: Thu Mar 17 18:09:34 CET 2005
Configure host: hal
Memory manager: ptmalloc2
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C compiler: gcc
C++ compiler: g++
Fortran compiler: g77
Fortran symbols: double_underscore
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
C++ exceptions: no
Thread support: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (API v1.1, Module v0.6)
SSI boot: rsh (API v1.1, Module v1.1)
SSI boot: slurm (API v1.1, Module v1.0)
SSI coll: lam_basic (API v1.1, Module v7.1)
SSI coll: shmem (API v1.1, Module v1.0)
SSI coll: smp (API v1.1, Module v1.2)
SSI rpi: crtcp (API v1.1, Module v1.1)
SSI rpi: lamd (API v1.0, Module v7.1)
SSI rpi: sysv (API v1.0, Module v7.1)
SSI rpi: tcp (API v1.0, Module v7.1)
SSI rpi: usysv (API v1.0, Module v7.1)
SSI cr: blcr (API v1.0, Module v1.1)
SSI cr: self (API v1.0, Module v1.0)
--
-- Frauen sind erstaunt, was Männer alles vergessen. Männer
-- sind erstaunt, woran Frauen sich erinnern.
-- (Peter Bamm, dt. Schriftsteller 1897-1975)
-- Heiko Bauke @ http://www.uni-magdeburg.de/bauke
|