$ mpirun -np 9 -ssi cr blcr cr_verbose
assume the pid = 1234
$ lamcheckpoint -ssi blcr -pid 1234
perhaps you don't configure your cr-base-file when in configure process, i don't know it's default direction, however, it's not the work direction.
so, try to find the file begins with "context.mpirun." in $LAMHOME.
then try to
$ lamrestart -ssi cr blcr -ssi cr_blcr_context_file <your context.mpirun.1234's fullpath>
i wish it help, if not, i'm afraid you need reinstall the LAMMPI with any other configure parameters.
Hatem Ltaief <ltaief_at_[hidden]> wrote:
Thanks for your help.
I guess here is my problem: I do not have any of these 9 context files on
my CWD and home directory.
Using lamcheckpoint and lamrestart commands give the same error and still
no context files for each running processes are created.
Any ideas?
Thanks,
Hatem
On Thu, 20 Apr 2006, Mars Lenjoy wrote:
> you'd better use lamcheckpoint and lamrestart to do that.
> if you want to use cr_checkpoint and the mpirun's pid is 1234,
> for example, try
> $ cr_checkpoint -f context.mpirun.1234 --run 1234
> make sure 9 contexts files whoes title contain "1234" and "context" are existed.
> then
> $ cr_restart context.mpirun.1234
> it should work
>
> hope it helps
>
>
> hatem ltaief wrote:
> Hi,
> I installed blcr and lam 7.1.2:
> [ltaief_at_compute-0-16 lammpi-cg_3D]$ laminfo -all
> LAM/MPI: 7.1.2
> SSI boot: globus (SSI v1.0, API v1.1, Module v0.6)
> SSI boot: rsh (SSI v1.0, API v1.1, Module v1.1)
> SSI boot: slurm (SSI v1.0, API v1.1, Module v1.0)
> SSI coll: lam_basic (SSI v1.0, API v1.1, Module v7.1)
> SSI coll: shmem (SSI v1.0, API v1.1, Module v1.0)
> SSI coll: smp (SSI v1.0, API v1.1, Module v1.2)
> SSI rpi: crtcp (SSI v1.0, API v1.1, Module v1.1)
> SSI rpi: lamd (SSI v1.0, API v1.0, Module v7.1)
> SSI rpi: sysv (SSI v1.0, API v1.0, Module v7.1)
> SSI rpi: tcp (SSI v1.0, API v1.0, Module v7.1)
> SSI rpi: usysv (SSI v1.0, API v1.0, Module v7.1)
> SSI cr: blcr (SSI v1.0, API v1.0, Module v1.1)
> SSI cr: self (SSI v1.0, API v1.0, Module v1.0)
> Prefix: /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77
> Bindir:
> /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/bin
> Libdir:
> /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/lib
> Incdir:
> /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/include
> Pkglibdir:
> /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/lib/lam
> Sysconfdir:
> /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/etc
> Architecture: x86_64-unknown-linux-gnu
> Configured by: ltaief
> Configured on: Thu Apr 20 19:37:33 CDT 2006
> Configure host: medusa.tlc2.uh.edu
> Memory manager: ptmalloc2
> C bindings: yes
> C++ bindings: yes
> Fortran bindings: yes
> C compiler: gcc
> C char size: 1
> C bool size: 1
> C short size: 2
> C int size: 4
> C long size: 8
> C float size: 4
> C double size: 8
> C pointer size: 8
> C char align: 1
> C bool align: 1
> C int align: 4
> C float align: 4
> C double align: 8
> C++ compiler: g++
> Fortran compiler: g77
> Fortran symbols: double_underscore
> Fort integer size: 4
> Fort real size: 4
> Fort dbl prec size: 4
> Fort cplx size: 4
> Fort dbl cplx size: 4
> Fort integer align: 4
> Fort real align: 4
> Fort dbl prec align: 4
> Fort cplx align: 4
> Fort dbl cplx align: 4
> C profiling: yes
> C++ profiling: yes
> Fortran profiling: yes
> C++ exceptions: no
> Thread support: yes
> ROMIO support: yes
> IMPI support: no
> Debug support: no
> Purify clean: no
> SSI base: parameter "verbose" (default value: )
> SSI mpi: parameter "mpi_hostmap" (default value:
>
> "/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/etc/lam-hostmap.txt")
> SSI base: parameter "base_module_path" (default value:
>
> "/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/lib/lam")
> SSI boot: parameter "boot_verbose" (default value: )
> SSI boot: parameter "boot" (default value: )
> SSI boot: parameter "boot_base_promisc" (default value: "0")
> SSI boot: parameter "boot_base_window_size" (default value: "5")
> SSI boot: parameter "boot_globus_priority" (default value: "3")
> SSI boot: parameter "boot_rsh_username" (default value: )
> SSI boot: parameter "boot_rsh_agent" (default value:
> "/bin/ssh ")
> SSI boot: parameter "boot_rsh_no_n" (default value: "0")
> SSI boot: parameter "boot_rsh_no_profile" (default value: "0")
> SSI boot: parameter "boot_rsh_fast" (default value: "0")
> SSI boot: parameter "boot_rsh_ignore_stderr" (default value:
> "0")
> SSI boot: parameter "boot_rsh_priority" (default value: "10")
> SSI boot: parameter "boot_slurm_priority" (default value: "50")
> SSI rpi: parameter "rpi_verbose" (default value: )
> SSI rpi: parameter "rpi" (default value: )
> SSI rpi: parameter "rpi_crtcp_priority" (default value: "25")
> SSI rpi: parameter "rpi_crtcp_short" (default value: "65536")
> SSI rpi: parameter "rpi_crtcp_sockbuf" (default value: "-1")
> SSI rpi: parameter "rpi_lamd_priority" (default value: "20")
> SSI rpi: parameter "rpi_sysv_pollyield" (default value: "1")
> SSI rpi: parameter "rpi_sysv_poolsize" (default value:
> "16777216")
> SSI rpi: parameter "rpi_sysv_maxalloc" (default value:
> "1048576")
> SSI rpi: parameter "rpi_sysv_short" (default value: "8192")
> SSI rpi: parameter "rpi_tcp_short" (default value: "65536")
> SSI rpi: parameter "rpi_tcp_sockbuf" (default value: "-1")
> SSI rpi: parameter "rpi_sysv_priority" (default value: "30")
> SSI rpi: parameter "rpi_tcp_priority" (default value: "20")
> SSI rpi: parameter "rpi_usysv_readlockpoll" (default value:
> "10000")
> SSI rpi: parameter "rpi_usysv_writelockpoll" (default value:
> "10")
> SSI rpi: parameter "rpi_usysv_pollyield" (default value: "1")
> SSI rpi: parameter "rpi_usysv_poolsize" (default value:
> "16777216")
> SSI rpi: parameter "rpi_usysv_maxalloc" (default value:
> "1048576")
> SSI rpi: parameter "rpi_usysv_short" (default value: "8192")
> SSI rpi: parameter "rpi_usysv_priority" (default value: "40")
> SSI coll: parameter "coll_verbose" (default value: )
> SSI coll: parameter "coll_shmem" (default value: "0")
> SSI cr: parameter "cr_verbose" (default value: )
> SSI cr: parameter "cr" (default value: )
> SSI cr: parameter "cr_blcr_priority" (default value: "50")
> SSI cr: parameter "cr_self_priority" (default value: "25")
> SSI cr: parameter "cr_self_do_restart" (default value: "0")
> SSI cr: parameter "cr_self_prefix" (default value:
> "lam_cr_self")
> SSI cr: parameter "cr_self_checkpoint" (default value: )
> SSI cr: parameter "cr_self_continue" (default value: )
> SSI cr: parameter "cr_self_restart" (default value: )
>
> Here is my output when running
> [ltaief_at_compute-0-16 lammpi-cg_3D]$ mpirun -np 9 -ssi cr_verbose
> level:1000,stderr -ssi rpi crtcp -ssi cr blcr -x LD_LIBRARY_PATH
> ./main_heat &
>
> /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/bin/mpif77 -O3 -w
> -I/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/include -o
> objet/main_heat.o -c ./source/main_heat.f
> /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/bin/mpif77 -o
> main_heat ./objet/main_heat.o ./objet/ddot.o ./objet/gather.o
> ./objet/UpdateBoundary.o ./objet/cut_Domain_proc.o ./objet/initF.o
> ./objet/checkpoint.o ./objet/matvec.o ./objet/compute_norm.o
> ./objet/Solve_CG.o ./objet/tempscom.o
> -L/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/lib
>
> Compilation Successfully Terminated!
>
> n0<6983> ssi:crmpi:open: opening
> n0<6983> ssi:crmpi:open: looking for cr module named blcr
> n0<6983> ssi:crmpi:open: opening cr module blcr
> n0<6983> ssi:crmpi:open: opened cr module blcr
> n0<6983> ssi:crmpi:query: querying cr module blcr
> n0<6983> ssi:crmpi:blcr: module initializing
> n0<6983> ssi:crmpi:blcr:verbose: 1000
> n0<6983> ssi:crmpi:blcr:priority: 50
> n0<6983> ssi:crmpi:query: cr module available: blcr, priority: 50
> n0<6984> ssi:crmpi:open: opening
> n0<6984> ssi:crmpi:open: looking for cr module named blcr
> n0<6984> ssi:crmpi:open: opening cr module blcr
> n0<6984> ssi:crmpi:open: opened cr module blcr
> n0<6984> ssi:crmpi:query: querying cr module blcr
> n0<6984> ssi:crmpi:blcr: module initializing
> n0<6984> ssi:crmpi:blcr:verbose: 1000
> n0<6984> ssi:crmpi:blcr:priority: 50
> n0<6984> ssi:crmpi:query: cr module available: blcr, priority: 50
> n4<6385> ssi:crmpi:open: opening
> n4<6385> ssi:crmpi:open: looking for cr module named blcr
> n4<6385> ssi:crmpi:open: opening cr module blcr
> n4<6385> ssi:crmpi:open: opened cr module blcr
> n4<6385> ssi:crmpi:query: querying cr module blcr
> n4<6385> ssi:crmpi:blcr: module initializing
> n4<6385> ssi:crmpi:blcr:verbose: 1000
> n4<6385> ssi:crmpi:blcr:priority: 50
> n4<6385> ssi:crmpi:query: cr module available: blcr, priority: 50
> n1<15131> ssi:crmpi:open: opening
> n2<3830> ssi:crmpi:open: opening
> n4<6386> ssi:crmpi:open: opening
> n3<25936> ssi:crmpi:open: opening
> n1<15131> ssi:crmpi:open: looking for cr module named blcr
> n1<15131> ssi:crmpi:open: opening cr module blcr
> n1<15131> ssi:crmpi:open: opened cr module blcr
> n1<15131> ssi:crmpi:query: querying cr module blcr
> n1<15131> ssi:crmpi:blcr: module initializing
> n1<15131> ssi:crmpi:blcr:verbose: 1000
> n1<15131> ssi:crmpi:blcr:priority: 50
> n1<15131> ssi:crmpi:query: cr module available: blcr, priority: 50
> n1<15132> ssi:crmpi:open: opening
> n2<3830> ssi:crmpi:open: looking for cr module named blcr
> n2<3830> ssi:crmpi:open: opening cr module blcr
> n2<3830> ssi:crmpi:open: opened cr module blcr
> n2<3830> ssi:crmpi:query: querying cr module blcr
> n2<3830> ssi:crmpi:blcr: module initializing
> n2<3830> ssi:crmpi:blcr:verbose: 1000
> n2<3830> ssi:crmpi:blcr:priority: 50
> n2<3830> ssi:crmpi:query: cr module available: blcr, priority: 50
> n4<6386> ssi:crmpi:open: looking for cr module named blcr
> n4<6386> ssi:crmpi:open: opening cr module blcr
> n4<6386> ssi:crmpi:open: opened cr module blcr
> n4<6386> ssi:crmpi:query: querying cr module blcr
> n4<6386> ssi:crmpi:blcr: module initializing
> n4<6386> ssi:crmpi:blcr:verbose: 1000
> n4<6386> ssi:crmpi:blcr:priority: 50
> n4<6386> ssi:crmpi:query: cr module available: blcr, priority: 50
> n3<25936> ssi:crmpi:open: looking for cr module named blcr
> n3<25936> ssi:crmpi:open: opening cr module blcr
> n3<25936> ssi:crmpi:open: opened cr module blcr
> n1<15132> ssi:crmpi:open: looking for cr module named blcr
> n1<15132> ssi:crmpi:open: opening cr module blcr
> n1<15132> ssi:crmpi:open: opened cr module blcr
> n1<15132> ssi:crmpi:query: querying cr module blcr
> n1<15132> ssi:crmpi:blcr: module initializing
> n1<15132> ssi:crmpi:blcr:verbose: 1000
> n1<15132> ssi:crmpi:blcr:priority: 50
> n1<15132> ssi:crmpi:query: cr module available: blcr, priority: 50
> n3<25936> ssi:crmpi:query: querying cr module blcr
> n3<25936> ssi:crmpi:blcr: module initializing
> n3<25936> ssi:crmpi:blcr:verbose: 1000
> n3<25936> ssi:crmpi:blcr:priority: 50
> n3<25936> ssi:crmpi:query: cr module available: blcr, priority: 50
> n3<25937> ssi:crmpi:open: opening
> n3<25937> ssi:crmpi:open: looking for cr module named blcr
> n3<25937> ssi:crmpi:open: opening cr module blcr
> n3<25937> ssi:crmpi:open: opened cr module blcr
> n3<25937> ssi:crmpi:query: querying cr module blcr
> n3<25937> ssi:crmpi:blcr: module initializing
> n3<25937> ssi:crmpi:blcr:verbose: 1000
> n3<25937> ssi:crmpi:blcr:priority: 50
> n3<25937> ssi:crmpi:query: cr module available: blcr, priority: 50
> n0<6977> ssi:crlam: Opening
> n0<6977> ssi:crlam: looking for module named blcr
> n0<6977> ssi:crlam: opening module blcr
> n0<6977> ssi:crlam: query module blcr
> n0<6977> ssi:crlam:blcr: module initializing
> n0<6977> ssi:crlam:blcr:verbose: 1000
> n0<6977> ssi:crlam:blcr:priority: 50
> n0<6977> ssi:crlam: Selected crlam module "blcr"
> n0<6977> ssi:crlam:Registered C/R handlers
> n0<6983> ssi:crmpi: initializing
> n0<6984> ssi:crmpi: initializing
> n0<6984> ssi:crmpi: CR support enabled (blcr)
> n1<15131> ssi:crmpi: initializing
> n0<6983> ssi:crmpi: CR support enabled (blcr)
> n1<15132> ssi:crmpi: initializing
> n2<3830> ssi:crmpi: initializing
> n3<25936> ssi:crmpi: initializing
> n2<3830> ssi:crmpi: CR support enabled (blcr)
> n1<15131> ssi:crmpi: CR support enabled (blcr)
> n3<25937> ssi:crmpi: initializing
> n4<6386> ssi:crmpi: initializing
> n3<25936> ssi:crmpi: CR support enabled (blcr)
> n4<6385> ssi:crmpi: initializing
> n4<6386> ssi:crmpi: CR support enabled (blcr)
> n1<15132> ssi:crmpi: CR support enabled (blcr)
> n3<25937> ssi:crmpi: CR support enabled (blcr)
> n4<6385> ssi:crmpi: CR support enabled (blcr)
> me= 0Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> me= 1Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> me= 7Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> me= 4Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> me= 8Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> me= 5Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> me= 6Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> me= 2Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> me= 3Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
> Execution Successfully Terminated!
> End of solving
> Time of Solving 3.88168907
> n0<6983> ssi:crmpi: Closing
> n0<6984> ssi:crmpi: Closing
> n2<3830> ssi:crmpi: Closing
> n4<6386> ssi:crmpi: Closing
> n3<25937> ssi:crmpi: Closing
> n1<15131> ssi:crmpi: Closing
> n1<15132> ssi:crmpi: Closing
> n4<6385> ssi:crmpi: Closing
> n3<25936> ssi:crmpi: Closing
>
>
> When I use cr_checkpoint command to checkpoint the mpirun process during
> the execution, it creates a context.PID file in my CWD.
> Then, after the program is finished I want to restart it by cr_restart
> context.PID.
> And I get this error:
> [ltaief_at_compute-0-16 lammpi-cg_3D]$ cr_restart context.6977
> mpirun: cannot start ./main_heat on n2: Bad file descriptor
>
> Any Ideas?
>
> Thanks and best regards,
> Hatem
>
>
>
>
>
>
> ___________________________________________________________________________
> Faites de Yahoo! votre page d'accueil sur le web pour retrouver directement vos services préférés : vérifiez vos nouveaux mails, lancez vos recherches et suivez l'actualit?en temps réel.
> Rendez-vous sur http://fr.yahoo.com/set
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
>
> ---------------------------------
> New Yahoo! Messenger with Voice. Call regular phones from your PC and save big.
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
---------------------------------
Celebrate Earth Day everyday! Discover 10 things you can do to help slow climate change. Yahoo! Earth Day
|