LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: David Tabernero Pérez (david.tabernero_at_[hidden])
Date: 2005-12-14 08:52:42


Hello.

    I have a similar problem. It's necesary turn off migration in all
nodes or only in master node?.

    Thanks a lot.

Jeff Squyres wrote:

>Mosix is going to have quite a few problems when trying to run MPI
>applications. They may handle some of the issues by leading socket
>forwarding agents around, but in general LAM is not setup to handle
>Mosix migration.
>
>In particular, note that LAM/MPI is composed of two parts -- the run-
>time environment and the MPI application. If Mosix is migrating
>these indiscriminately, Bad Things will happen (I can explain more if
>you'd like). Try turning off migration/Mosix and see if the problem
>disappears.
>
>On Dec 10, 2005, at 4:03 PM, M.Kondrin wrote:
>
>
>
>>Hello!
>>I quite new to lam-mpi and have just installed it on the cluster
>>consisting of 8 AMD machines with Linux /OpenMosix-2.4.26. I have
>>there
>>two networks - the slow administrative one and the fast one for lam
>>data transfers. This second networks is reserved for lam by
>>populating
>>/etc/lam-bhost.def with the machine names assigned to cards in the
>>gigabit network. Everywhere else in the systems traffic is routed
>>to the
>>first (administative) network. It works almost fine except with the
>>programms which call MPI_Comm_spawn (for example spawn.c and
>>spawn_multiple.c from lamtests suite). If LAM_MPI_SSI_rpi is set to
>>values other than "lamd" the programs hang at this call and can be
>>terminated only with lamhalt or ctrl+C. It depends on the number of
>>lamd daemons booted - with lesser numbers the hangs are less frequent.
>>Otherwise when I set the environment variable LAM_MPI_SSI_rpi to lamd
>>there are no hangs, but the response time in this case is worse. I
>>have
>>tested that with lam-7.1.1 and lam-7.1.2b29.
>>Is there a fix for this bug so that rpi:tcp could be used with
>>MPI_Comm_spawn. rpi:tcp module is required AFAIK for Mosix to do
>>load-balancing and migrate process between nodes.
>>Thank you
>>M.Kondrin
>>PS lam was configured with:
>>CFLAGS="-O2 -march=i486 -mcpu=i686" ./configure --prefix=/usr
>>--sysconfdir=/etc --localstatedir
>>=/var --enable-shared=yes --with-rpi=tcp --with-modules --with-
>>trillium
>>The boot module is rsh (I have kerberized rsh on the boxes). Lam
>>executables are linked with:
>>ldd /usr/bin/mpirun
>> liblam.so.0 => /usr/lib/liblam.so.0 (0x40024000)
>> libdl.so.2 => /lib/libdl.so.2 (0x4006e000)
>> libutil.so.1 => /lib/libutil.so.1 (0x40072000)
>> libpthread.so.0 => /lib/libpthread.so.0 (0x40076000)
>> libc.so.6 => /lib/libc.so.6 (0x400c7000)
>> /lib/ld-linux.so.2 (0x40000000)
>>
>>laminfo -all :
>> LAM/MPI: 7.1.2b29
>> SSI boot: globus (SSI v1.0, API v1.1, Module v0.6)
>> SSI boot: rsh (SSI v1.0, API v1.1, Module v1.1)
>> SSI boot: slurm (SSI v1.0, API v1.1, Module v1.0)
>> SSI coll: lam_basic (SSI v1.0, API v1.1, Module v7.1)
>> SSI coll: shmem (SSI v1.0, API v1.1, Module v1.0)
>> SSI coll: smp (SSI v1.0, API v1.1, Module v1.2)
>> SSI rpi: crtcp (SSI v1.0, API v1.1, Module v1.1)
>> SSI rpi: lamd (SSI v1.0, API v1.0, Module v7.1)
>> SSI rpi: tcp (SSI v1.0, API v1.0, Module v7.1)
>> SSI rpi: sysv (SSI v1.0, API v1.0, Module v7.1)
>> SSI rpi: usysv (SSI v1.0, API v1.0, Module v7.1)
>> SSI cr: self (SSI v1.0, API v1.0, Module v1.0)
>> Prefix: /usr
>> Bindir: /usr/bin
>> Libdir: /usr/lib
>> Incdir: /usr/include
>> Pkglibdir: /usr/lib/lam
>> Sysconfdir: /etc
>> Architecture: i686-pc-linux-gnu
>> Configured by: root
>> Configured on: Sat Dec 10 13:24:55 UTC 2005
>> Configure host: alpha....
>> Memory manager: ptmalloc2
>> C bindings: yes
>> C++ bindings: yes
>> Fortran bindings: yes
>> C compiler: gcc
>> C char size: 1
>> C bool size: 1
>> C short size: 2
>> C int size: 4
>> C long size: 4
>> C float size: 4
>> C double size: 8
>> C pointer size: 4
>> C char align: 1
>> C bool align: 1
>> C int align: 4
>> C float align: 4
>> C double align: 4
>> C++ compiler: g++
>> Fortran compiler: g77
>> Fortran symbols: double_underscore
>> Fort integer size: 4
>> Fort real size: 4
>> Fort dbl prec size: 4
>> Fort cplx size: 4
>> Fort dbl cplx size: 4
>> Fort integer align: 4
>> Fort real align: 4
>>Fort dbl prec align: 4
>> Fort cplx align: 4
>>Fort dbl cplx align: 4
>> C profiling: yes
>> C++ profiling: yes
>> Fortran profiling: yes
>> C++ exceptions: no
>> Thread support: yes
>> ROMIO support: yes
>> IMPI support: no
>> Debug support: no
>> Purify clean: no
>> SSI base: parameter "verbose" (default value: <none>)
>> SSI mpi: parameter "mpi_hostmap" (default value:
>> "/etc/lam-hostmap.txt")
>> SSI base: parameter "base_module_path" (default value:
>> "/usr/lib/lam")
>> SSI boot: parameter "boot_verbose" (default value: <none>)
>> SSI boot: parameter "boot" (default value: <none>)
>> SSI boot: parameter "boot_base_promisc" (default value:
>>"0")
>> SSI boot: parameter "boot_base_window_size" (default
>>value: "5")
>> SSI boot: parameter "boot_globus_priority" (default
>>value: "3")
>> SSI boot: parameter "boot_rsh_username" (default value:
>><none>)
>> SSI boot: parameter "boot_rsh_agent" (default value:
>>"rsh")
>> SSI boot: parameter "boot_rsh_no_n" (default value: "0")
>> SSI boot: parameter "boot_rsh_no_profile" (default
>>value: "0")
>> SSI boot: parameter "boot_rsh_fast" (default value: "0")
>> SSI boot: parameter "boot_rsh_ignore_stderr" (default
>>value:
>>"0")
>> SSI boot: parameter "boot_rsh_priority" (default value:
>>"10")
>> SSI boot: parameter "boot_slurm_priority" (default
>>value: "50")
>> SSI rpi: parameter "rpi_verbose" (default value: <none>)
>> SSI rpi: parameter "rpi" (default value: <none>)
>> SSI rpi: parameter "rpi_crtcp_priority" (default
>>value: "25")
>> SSI rpi: parameter "rpi_crtcp_short" (default value:
>>"65536")
>> SSI rpi: parameter "rpi_crtcp_sockbuf" (default value:
>>"-1")
>> SSI rpi: parameter "rpi_lamd_priority" (default value:
>>"20")
>> SSI rpi: parameter "rpi_tcp_short" (default value:
>>"65536")
>> SSI rpi: parameter "rpi_tcp_sockbuf" (default value:
>>"-1")
>> SSI rpi: parameter "rpi_tcp_priority" (default value:
>>"75")
>> SSI rpi: parameter "rpi_sysv_pollyield" (default
>>value: "1")
>> SSI rpi: parameter "rpi_sysv_poolsize" (default value:
>> "16777216")
>> SSI rpi: parameter "rpi_sysv_maxalloc" (default value:
>> "1048576")
>> SSI rpi: parameter "rpi_sysv_short" (default value:
>>"8192")
>> SSI rpi: parameter "rpi_sysv_priority" (default value:
>>"30")
>> SSI rpi: parameter "rpi_usysv_readlockpoll" (default
>>value:
>> "10000")
>> SSI rpi: parameter "rpi_usysv_writelockpoll" (default
>>value:
>> "10")
>> SSI rpi: parameter "rpi_usysv_pollyield" (default
>>value: "1")
>> SSI rpi: parameter "rpi_usysv_poolsize" (default value:
>> "16777216")
>> SSI rpi: parameter "rpi_usysv_maxalloc" (default value:
>> "1048576")
>> SSI rpi: parameter "rpi_usysv_short" (default value:
>>"8192")
>> SSI rpi: parameter "rpi_usysv_priority" (default
>>value: "40")
>> SSI coll: parameter "coll_verbose" (default value: <none>)
>> SSI coll: parameter "coll_shmem" (default value: "0")
>> SSI cr: parameter "cr_verbose" (default value: <none>)
>> SSI cr: parameter "cr" (default value: <none>)
>> SSI cr: parameter "cr_self_priority" (default value:
>>"25")
>> SSI cr: parameter "cr_self_do_restart" (default
>>value: "0")
>> SSI cr: parameter "cr_self_prefix" (default value:
>> "lam_cr_self")
>> SSI cr: parameter "cr_self_checkpoint" (default
>>value: <none>)
>> SSI cr: parameter "cr_self_continue" (default value:
>><none>)
>> SSI cr: parameter "cr_self_restart" (default value:
>><none>)
>>
>>
>>
>>_______________________________________________
>>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>>
>
>
>--
>{+} Jeff Squyres
>{+} The Open MPI Project
>{+} http://www.open-mpi.org/
>
>
>
>_______________________________________________
>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
>
>
>
>