LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: M.Kondrin (mkondrin_at_[hidden])
Date: 2005-12-10 16:03:54


Hello!
I quite new to lam-mpi and have just installed it on the cluster
consisting of 8 AMD machines with Linux /OpenMosix-2.4.26. I have there
two networks - the slow administrative one and the fast one for lam
data transfers. This second networks is reserved for lam by populating
/etc/lam-bhost.def with the machine names assigned to cards in the
gigabit network. Everywhere else in the systems traffic is routed to the
first (administative) network. It works almost fine except with the
programms which call MPI_Comm_spawn (for example spawn.c and
spawn_multiple.c from lamtests suite). If LAM_MPI_SSI_rpi is set to
values other than "lamd" the programs hang at this call and can be
terminated only with lamhalt or ctrl+C. It depends on the number of
lamd daemons booted - with lesser numbers the hangs are less frequent.
Otherwise when I set the environment variable LAM_MPI_SSI_rpi to lamd
there are no hangs, but the response time in this case is worse. I have
tested that with lam-7.1.1 and lam-7.1.2b29.
Is there a fix for this bug so that rpi:tcp could be used with
MPI_Comm_spawn. rpi:tcp module is required AFAIK for Mosix to do
load-balancing and migrate process between nodes.
Thank you
M.Kondrin
PS lam was configured with:
CFLAGS="-O2 -march=i486 -mcpu=i686" ./configure --prefix=/usr
--sysconfdir=/etc --localstatedir
=/var --enable-shared=yes --with-rpi=tcp --with-modules --with-trillium
The boot module is rsh (I have kerberized rsh on the boxes). Lam
executables are linked with:
ldd /usr/bin/mpirun
        liblam.so.0 => /usr/lib/liblam.so.0 (0x40024000)
        libdl.so.2 => /lib/libdl.so.2 (0x4006e000)
        libutil.so.1 => /lib/libutil.so.1 (0x40072000)
        libpthread.so.0 => /lib/libpthread.so.0 (0x40076000)
        libc.so.6 => /lib/libc.so.6 (0x400c7000)
        /lib/ld-linux.so.2 (0x40000000)

laminfo -all :
             LAM/MPI: 7.1.2b29
            SSI boot: globus (SSI v1.0, API v1.1, Module v0.6)
            SSI boot: rsh (SSI v1.0, API v1.1, Module v1.1)
            SSI boot: slurm (SSI v1.0, API v1.1, Module v1.0)
            SSI coll: lam_basic (SSI v1.0, API v1.1, Module v7.1)
            SSI coll: shmem (SSI v1.0, API v1.1, Module v1.0)
            SSI coll: smp (SSI v1.0, API v1.1, Module v1.2)
             SSI rpi: crtcp (SSI v1.0, API v1.1, Module v1.1)
             SSI rpi: lamd (SSI v1.0, API v1.0, Module v7.1)
             SSI rpi: tcp (SSI v1.0, API v1.0, Module v7.1)
             SSI rpi: sysv (SSI v1.0, API v1.0, Module v7.1)
             SSI rpi: usysv (SSI v1.0, API v1.0, Module v7.1)
              SSI cr: self (SSI v1.0, API v1.0, Module v1.0)
              Prefix: /usr
              Bindir: /usr/bin
              Libdir: /usr/lib
              Incdir: /usr/include
           Pkglibdir: /usr/lib/lam
          Sysconfdir: /etc
        Architecture: i686-pc-linux-gnu
       Configured by: root
       Configured on: Sat Dec 10 13:24:55 UTC 2005
      Configure host: alpha....
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: gcc
         C char size: 1
         C bool size: 1
        C short size: 2
          C int size: 4
         C long size: 4
        C float size: 4
       C double size: 8
      C pointer size: 4
        C char align: 1
        C bool align: 1
         C int align: 4
       C float align: 4
      C double align: 4
        C++ compiler: g++
    Fortran compiler: g77
     Fortran symbols: double_underscore
   Fort integer size: 4
      Fort real size: 4
  Fort dbl prec size: 4
      Fort cplx size: 4
  Fort dbl cplx size: 4
  Fort integer align: 4
     Fort real align: 4
Fort dbl prec align: 4
     Fort cplx align: 4
Fort dbl cplx align: 4
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI base: parameter "verbose" (default value: <none>)
             SSI mpi: parameter "mpi_hostmap" (default value:
                      "/etc/lam-hostmap.txt")
            SSI base: parameter "base_module_path" (default value:
                      "/usr/lib/lam")
            SSI boot: parameter "boot_verbose" (default value: <none>)
            SSI boot: parameter "boot" (default value: <none>)
            SSI boot: parameter "boot_base_promisc" (default value: "0")
            SSI boot: parameter "boot_base_window_size" (default value: "5")
            SSI boot: parameter "boot_globus_priority" (default value: "3")
            SSI boot: parameter "boot_rsh_username" (default value: <none>)
            SSI boot: parameter "boot_rsh_agent" (default value: "rsh")
            SSI boot: parameter "boot_rsh_no_n" (default value: "0")
            SSI boot: parameter "boot_rsh_no_profile" (default value: "0")
            SSI boot: parameter "boot_rsh_fast" (default value: "0")
            SSI boot: parameter "boot_rsh_ignore_stderr" (default value:
"0")
            SSI boot: parameter "boot_rsh_priority" (default value: "10")
            SSI boot: parameter "boot_slurm_priority" (default value: "50")
             SSI rpi: parameter "rpi_verbose" (default value: <none>)
             SSI rpi: parameter "rpi" (default value: <none>)
             SSI rpi: parameter "rpi_crtcp_priority" (default value: "25")
             SSI rpi: parameter "rpi_crtcp_short" (default value: "65536")
             SSI rpi: parameter "rpi_crtcp_sockbuf" (default value: "-1")
             SSI rpi: parameter "rpi_lamd_priority" (default value: "20")
             SSI rpi: parameter "rpi_tcp_short" (default value: "65536")
             SSI rpi: parameter "rpi_tcp_sockbuf" (default value: "-1")
             SSI rpi: parameter "rpi_tcp_priority" (default value: "75")
             SSI rpi: parameter "rpi_sysv_pollyield" (default value: "1")
             SSI rpi: parameter "rpi_sysv_poolsize" (default value:
                      "16777216")
             SSI rpi: parameter "rpi_sysv_maxalloc" (default value:
                      "1048576")
             SSI rpi: parameter "rpi_sysv_short" (default value: "8192")
             SSI rpi: parameter "rpi_sysv_priority" (default value: "30")
             SSI rpi: parameter "rpi_usysv_readlockpoll" (default value:
                      "10000")
             SSI rpi: parameter "rpi_usysv_writelockpoll" (default value:
                      "10")
             SSI rpi: parameter "rpi_usysv_pollyield" (default value: "1")
             SSI rpi: parameter "rpi_usysv_poolsize" (default value:
                      "16777216")
             SSI rpi: parameter "rpi_usysv_maxalloc" (default value:
                      "1048576")
             SSI rpi: parameter "rpi_usysv_short" (default value: "8192")
             SSI rpi: parameter "rpi_usysv_priority" (default value: "40")
            SSI coll: parameter "coll_verbose" (default value: <none>)
            SSI coll: parameter "coll_shmem" (default value: "0")
              SSI cr: parameter "cr_verbose" (default value: <none>)
              SSI cr: parameter "cr" (default value: <none>)
              SSI cr: parameter "cr_self_priority" (default value: "25")
              SSI cr: parameter "cr_self_do_restart" (default value: "0")
              SSI cr: parameter "cr_self_prefix" (default value:
                      "lam_cr_self")
              SSI cr: parameter "cr_self_checkpoint" (default value: <none>)
              SSI cr: parameter "cr_self_continue" (default value: <none>)
              SSI cr: parameter "cr_self_restart" (default value: <none>)