LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: hatem ltaief (ltaief_at_[hidden])
Date: 2006-04-20 21:19:46


Hi,
I installed blcr and lam 7.1.2:
[ltaief_at_compute-0-16 lammpi-cg_3D]$ laminfo -all
             LAM/MPI: 7.1.2
            SSI boot: globus (SSI v1.0, API v1.1, Module v0.6)
            SSI boot: rsh (SSI v1.0, API v1.1, Module v1.1)
            SSI boot: slurm (SSI v1.0, API v1.1, Module v1.0)
            SSI coll: lam_basic (SSI v1.0, API v1.1, Module v7.1)
            SSI coll: shmem (SSI v1.0, API v1.1, Module v1.0)
            SSI coll: smp (SSI v1.0, API v1.1, Module v1.2)
             SSI rpi: crtcp (SSI v1.0, API v1.1, Module v1.1)
             SSI rpi: lamd (SSI v1.0, API v1.0, Module v7.1)
             SSI rpi: sysv (SSI v1.0, API v1.0, Module v7.1)
             SSI rpi: tcp (SSI v1.0, API v1.0, Module v7.1)
             SSI rpi: usysv (SSI v1.0, API v1.0, Module v7.1)
              SSI cr: blcr (SSI v1.0, API v1.0, Module v1.1)
              SSI cr: self (SSI v1.0, API v1.0, Module v1.0)
              Prefix: /home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77
              Bindir:
/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/bin
              Libdir:
/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/lib
              Incdir:
/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/include
           Pkglibdir:
/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/lib/lam
          Sysconfdir:
/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/etc
        Architecture: x86_64-unknown-linux-gnu
       Configured by: ltaief
       Configured on: Thu Apr 20 19:37:33 CDT 2006
      Configure host: medusa.tlc2.uh.edu
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: gcc
         C char size: 1
         C bool size: 1
        C short size: 2
          C int size: 4
         C long size: 8
        C float size: 4
       C double size: 8
      C pointer size: 8
        C char align: 1
        C bool align: 1
         C int align: 4
       C float align: 4
      C double align: 8
        C++ compiler: g++
    Fortran compiler: g77
     Fortran symbols: double_underscore
   Fort integer size: 4
      Fort real size: 4
  Fort dbl prec size: 4
      Fort cplx size: 4
  Fort dbl cplx size: 4
  Fort integer align: 4
     Fort real align: 4
 Fort dbl prec align: 4
     Fort cplx align: 4
 Fort dbl cplx align: 4
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI base: parameter "verbose" (default value: <none>)
             SSI mpi: parameter "mpi_hostmap" (default value:
                      
"/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/etc/lam-hostmap.txt")
            SSI base: parameter "base_module_path" (default value:
                      
"/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/lib/lam")
            SSI boot: parameter "boot_verbose" (default value: <none>)
            SSI boot: parameter "boot" (default value: <none>)
            SSI boot: parameter "boot_base_promisc" (default value: "0")
            SSI boot: parameter "boot_base_window_size" (default value: "5")
            SSI boot: parameter "boot_globus_priority" (default value: "3")
            SSI boot: parameter "boot_rsh_username" (default value: <none>)
            SSI boot: parameter "boot_rsh_agent" (default value:
"/bin/ssh ")
            SSI boot: parameter "boot_rsh_no_n" (default value: "0")
            SSI boot: parameter "boot_rsh_no_profile" (default value: "0")
            SSI boot: parameter "boot_rsh_fast" (default value: "0")
            SSI boot: parameter "boot_rsh_ignore_stderr" (default value:
"0")
            SSI boot: parameter "boot_rsh_priority" (default value: "10")
            SSI boot: parameter "boot_slurm_priority" (default value: "50")
             SSI rpi: parameter "rpi_verbose" (default value: <none>)
             SSI rpi: parameter "rpi" (default value: <none>)
             SSI rpi: parameter "rpi_crtcp_priority" (default value: "25")
             SSI rpi: parameter "rpi_crtcp_short" (default value: "65536")
             SSI rpi: parameter "rpi_crtcp_sockbuf" (default value: "-1")
             SSI rpi: parameter "rpi_lamd_priority" (default value: "20")
             SSI rpi: parameter "rpi_sysv_pollyield" (default value: "1")
             SSI rpi: parameter "rpi_sysv_poolsize" (default value:
                      "16777216")
             SSI rpi: parameter "rpi_sysv_maxalloc" (default value:
                      "1048576")
             SSI rpi: parameter "rpi_sysv_short" (default value: "8192")
             SSI rpi: parameter "rpi_tcp_short" (default value: "65536")
             SSI rpi: parameter "rpi_tcp_sockbuf" (default value: "-1")
             SSI rpi: parameter "rpi_sysv_priority" (default value: "30")
             SSI rpi: parameter "rpi_tcp_priority" (default value: "20")
             SSI rpi: parameter "rpi_usysv_readlockpoll" (default value:
                      "10000")
             SSI rpi: parameter "rpi_usysv_writelockpoll" (default value:
                      "10")
             SSI rpi: parameter "rpi_usysv_pollyield" (default value: "1")
             SSI rpi: parameter "rpi_usysv_poolsize" (default value:
                      "16777216")
             SSI rpi: parameter "rpi_usysv_maxalloc" (default value:
                      "1048576")
             SSI rpi: parameter "rpi_usysv_short" (default value: "8192")
             SSI rpi: parameter "rpi_usysv_priority" (default value: "40")
            SSI coll: parameter "coll_verbose" (default value: <none>)
            SSI coll: parameter "coll_shmem" (default value: "0")
              SSI cr: parameter "cr_verbose" (default value: <none>)
              SSI cr: parameter "cr" (default value: <none>)
              SSI cr: parameter "cr_blcr_priority" (default value: "50")
              SSI cr: parameter "cr_self_priority" (default value: "25")
              SSI cr: parameter "cr_self_do_restart" (default value: "0")
              SSI cr: parameter "cr_self_prefix" (default value:
                      "lam_cr_self")
              SSI cr: parameter "cr_self_checkpoint" (default value: <none>)
              SSI cr: parameter "cr_self_continue" (default value: <none>)
              SSI cr: parameter "cr_self_restart" (default value: <none>)

Here is my output when running
[ltaief_at_compute-0-16 lammpi-cg_3D]$ mpirun -np 9 -ssi cr_verbose
level:1000,stderr -ssi rpi crtcp -ssi cr blcr -x LD_LIBRARY_PATH
./main_heat &

/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/bin/mpif77 -O3 -w
-I/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/include -o
objet/main_heat.o -c ./source/main_heat.f
/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/bin/mpif77 -o
main_heat ./objet/main_heat.o ./objet/ddot.o ./objet/gather.o
./objet/UpdateBoundary.o ./objet/cut_Domain_proc.o ./objet/initF.o
./objet/checkpoint.o ./objet/matvec.o ./objet/compute_norm.o
./objet/Solve_CG.o ./objet/tempscom.o
-L/home/l/ltaief/source_lammpi/lam-src-f77/COMPILE_F77/lib

Compilation Successfully Terminated!

n0<6983> ssi:crmpi:open: opening
n0<6983> ssi:crmpi:open: looking for cr module named blcr
n0<6983> ssi:crmpi:open: opening cr module blcr
n0<6983> ssi:crmpi:open: opened cr module blcr
n0<6983> ssi:crmpi:query: querying cr module blcr
n0<6983> ssi:crmpi:blcr: module initializing
n0<6983> ssi:crmpi:blcr:verbose: 1000
n0<6983> ssi:crmpi:blcr:priority: 50
n0<6983> ssi:crmpi:query: cr module available: blcr, priority: 50
n0<6984> ssi:crmpi:open: opening
n0<6984> ssi:crmpi:open: looking for cr module named blcr
n0<6984> ssi:crmpi:open: opening cr module blcr
n0<6984> ssi:crmpi:open: opened cr module blcr
n0<6984> ssi:crmpi:query: querying cr module blcr
n0<6984> ssi:crmpi:blcr: module initializing
n0<6984> ssi:crmpi:blcr:verbose: 1000
n0<6984> ssi:crmpi:blcr:priority: 50
n0<6984> ssi:crmpi:query: cr module available: blcr, priority: 50
n4<6385> ssi:crmpi:open: opening
n4<6385> ssi:crmpi:open: looking for cr module named blcr
n4<6385> ssi:crmpi:open: opening cr module blcr
n4<6385> ssi:crmpi:open: opened cr module blcr
n4<6385> ssi:crmpi:query: querying cr module blcr
n4<6385> ssi:crmpi:blcr: module initializing
n4<6385> ssi:crmpi:blcr:verbose: 1000
n4<6385> ssi:crmpi:blcr:priority: 50
n4<6385> ssi:crmpi:query: cr module available: blcr, priority: 50
n1<15131> ssi:crmpi:open: opening
n2<3830> ssi:crmpi:open: opening
n4<6386> ssi:crmpi:open: opening
n3<25936> ssi:crmpi:open: opening
n1<15131> ssi:crmpi:open: looking for cr module named blcr
n1<15131> ssi:crmpi:open: opening cr module blcr
n1<15131> ssi:crmpi:open: opened cr module blcr
n1<15131> ssi:crmpi:query: querying cr module blcr
n1<15131> ssi:crmpi:blcr: module initializing
n1<15131> ssi:crmpi:blcr:verbose: 1000
n1<15131> ssi:crmpi:blcr:priority: 50
n1<15131> ssi:crmpi:query: cr module available: blcr, priority: 50
n1<15132> ssi:crmpi:open: opening
n2<3830> ssi:crmpi:open: looking for cr module named blcr
n2<3830> ssi:crmpi:open: opening cr module blcr
n2<3830> ssi:crmpi:open: opened cr module blcr
n2<3830> ssi:crmpi:query: querying cr module blcr
n2<3830> ssi:crmpi:blcr: module initializing
n2<3830> ssi:crmpi:blcr:verbose: 1000
n2<3830> ssi:crmpi:blcr:priority: 50
n2<3830> ssi:crmpi:query: cr module available: blcr, priority: 50
n4<6386> ssi:crmpi:open: looking for cr module named blcr
n4<6386> ssi:crmpi:open: opening cr module blcr
n4<6386> ssi:crmpi:open: opened cr module blcr
n4<6386> ssi:crmpi:query: querying cr module blcr
n4<6386> ssi:crmpi:blcr: module initializing
n4<6386> ssi:crmpi:blcr:verbose: 1000
n4<6386> ssi:crmpi:blcr:priority: 50
n4<6386> ssi:crmpi:query: cr module available: blcr, priority: 50
n3<25936> ssi:crmpi:open: looking for cr module named blcr
n3<25936> ssi:crmpi:open: opening cr module blcr
n3<25936> ssi:crmpi:open: opened cr module blcr
n1<15132> ssi:crmpi:open: looking for cr module named blcr
n1<15132> ssi:crmpi:open: opening cr module blcr
n1<15132> ssi:crmpi:open: opened cr module blcr
n1<15132> ssi:crmpi:query: querying cr module blcr
n1<15132> ssi:crmpi:blcr: module initializing
n1<15132> ssi:crmpi:blcr:verbose: 1000
n1<15132> ssi:crmpi:blcr:priority: 50
n1<15132> ssi:crmpi:query: cr module available: blcr, priority: 50
n3<25936> ssi:crmpi:query: querying cr module blcr
n3<25936> ssi:crmpi:blcr: module initializing
n3<25936> ssi:crmpi:blcr:verbose: 1000
n3<25936> ssi:crmpi:blcr:priority: 50
n3<25936> ssi:crmpi:query: cr module available: blcr, priority: 50
n3<25937> ssi:crmpi:open: opening
n3<25937> ssi:crmpi:open: looking for cr module named blcr
n3<25937> ssi:crmpi:open: opening cr module blcr
n3<25937> ssi:crmpi:open: opened cr module blcr
n3<25937> ssi:crmpi:query: querying cr module blcr
n3<25937> ssi:crmpi:blcr: module initializing
n3<25937> ssi:crmpi:blcr:verbose: 1000
n3<25937> ssi:crmpi:blcr:priority: 50
n3<25937> ssi:crmpi:query: cr module available: blcr, priority: 50
n0<6977> ssi:crlam: Opening
n0<6977> ssi:crlam: looking for module named blcr
n0<6977> ssi:crlam: opening module blcr
n0<6977> ssi:crlam: query module blcr
n0<6977> ssi:crlam:blcr: module initializing
n0<6977> ssi:crlam:blcr:verbose: 1000
n0<6977> ssi:crlam:blcr:priority: 50
n0<6977> ssi:crlam: Selected crlam module "blcr"
n0<6977> ssi:crlam:Registered C/R handlers
n0<6983> ssi:crmpi: initializing
n0<6984> ssi:crmpi: initializing
n0<6984> ssi:crmpi: CR support enabled (blcr)
n1<15131> ssi:crmpi: initializing
n0<6983> ssi:crmpi: CR support enabled (blcr)
n1<15132> ssi:crmpi: initializing
n2<3830> ssi:crmpi: initializing
n3<25936> ssi:crmpi: initializing
n2<3830> ssi:crmpi: CR support enabled (blcr)
n1<15131> ssi:crmpi: CR support enabled (blcr)
n3<25937> ssi:crmpi: initializing
n4<6386> ssi:crmpi: initializing
n3<25936> ssi:crmpi: CR support enabled (blcr)
n4<6385> ssi:crmpi: initializing
n4<6386> ssi:crmpi: CR support enabled (blcr)
n1<15132> ssi:crmpi: CR support enabled (blcr)
n3<25937> ssi:crmpi: CR support enabled (blcr)
n4<6385> ssi:crmpi: CR support enabled (blcr)
 me= 0Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
 me= 1Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
 me= 7Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
 me= 4Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
 me= 8Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
 me= 5Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
 me= 6Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
 me= 2Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
 me= 3Nx= 26Ny= 26Nz= 50maxT= 1000dt= 0.00159999993
Execution Successfully Terminated!
End of solving
 Time of Solving 3.88168907
n0<6983> ssi:crmpi: Closing
n0<6984> ssi:crmpi: Closing
n2<3830> ssi:crmpi: Closing
n4<6386> ssi:crmpi: Closing
n3<25937> ssi:crmpi: Closing
n1<15131> ssi:crmpi: Closing
n1<15132> ssi:crmpi: Closing
n4<6385> ssi:crmpi: Closing
n3<25936> ssi:crmpi: Closing

When I use cr_checkpoint command to checkpoint the mpirun process during
the execution, it creates a context.PID file in my CWD.
Then, after the program is finished I want to restart it by cr_restart
context.PID.
And I get this error:
[ltaief_at_compute-0-16 lammpi-cg_3D]$ cr_restart context.6977
mpirun: cannot start ./main_heat on n2: Bad file descriptor

Any Ideas?

Thanks and best regards,
Hatem

        

        
                
___________________________________________________________________________
Faites de Yahoo! votre page d'accueil sur le web pour retrouver directement vos services préférés : vérifiez vos nouveaux mails, lancez vos recherches et suivez l'actualité en temps réel.
Rendez-vous sur http://fr.yahoo.com/set