LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: manumachu reddy (manumachu.reddy_at_[hidden])
Date: 2004-11-08 08:37:00


    Hi,

    I have LAM-7.1.1 installed on a Solaris and a Linux machine. The
installation details are shown below:

Linux machine
---------------------
$uname -a
Linux pg1cluster01 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004
i686 i686 i386 GNU/Linux
$laminfo
             LAM/MPI: 7.1.1
              Prefix:
/home/cs/manredd/lam-7.1.1/lam-7.1.1/LAM-Linux-2.6.8-1.521smp
        Architecture: i686-pc-linux-gnu
       Configured by: manredd
       Configured on: Mon Nov 1 10:50:20 GMT 2004
      Configure host: pg1cluster01
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: gcc
        C++ compiler: g++
    Fortran compiler: g77
     Fortran symbols: double_underscore
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI boot: globus (API v1.1, Module v0.6)
            SSI boot: rsh (API v1.1, Module v1.1)
            SSI boot: slurm (API v1.1, Module v1.0)
            SSI coll: lam_basic (API v1.1, Module v7.1)
            SSI coll: shmem (API v1.1, Module v1.0)
            SSI coll: smp (API v1.1, Module v1.2)
             SSI rpi: crtcp (API v1.1, Module v1.1)
             SSI rpi: lamd (API v1.0, Module v7.1)
             SSI rpi: sysv (API v1.0, Module v7.1)
             SSI rpi: tcp (API v1.0, Module v7.1)
             SSI rpi: usysv (API v1.0, Module v7.1)
              SSI cr: self (API v1.0, Module v1.0)

Solaris machine
---------------------
$uname -a
SunOS csultra01 5.9 Generic_112233-10 sun4u sparc SUNW,Ultra-5_10
$laminfo
             LAM/MPI: 7.1.1
              Prefix: /home/cs/manredd/lam-7.1.1/lam-7.1.1/LAM-SunOS-5.9
        Architecture: sparc-sun-solaris2.9
       Configured by:
       Configured on: Tue Nov 2 14:19:31 GMT 2004
      Configure host: csultra01
      Memory manager: none
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: gcc
        C++ compiler: g++
    Fortran compiler: g77
     Fortran symbols: double_underscore
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI boot: globus (API v1.1, Module v0.6)
            SSI boot: rsh (API v1.1, Module v1.1)
            SSI boot: slurm (API v1.1, Module v1.0)
            SSI coll: lam_basic (API v1.1, Module v7.1)
            SSI coll: shmem (API v1.1, Module v1.0)
            SSI coll: smp (API v1.1, Module v1.2)
             SSI rpi: crtcp (API v1.1, Module v1.1)
             SSI rpi: lamd (API v1.0, Module v7.1)
             SSI rpi: sysv (API v1.0, Module v7.1)
             SSI rpi: tcp (API v1.0, Module v7.1)
             SSI rpi: usysv (API v1.0, Module v7.1)
              SSI cr: self (API v1.0, Module v1.0)

   I have a lamboot file which includes both the machines.

$cat $HOME/lamtopo/Linux_Solaris
pg1cluster01
csultra01

   'ssh' works fine between the two machines and is set up to not
prompt for the password.

   When I try to lamboot from the Linux machine, I get the error:

lamboot on Linux machine
--------------------------------------
$lamboot -v $HOME/lamtopo/Linux_Solaris

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

n-1<18055> ssi:boot:base:linear: booting n0 (pg1cluster01)
n-1<18055> ssi:boot:base:linear: booting n1 (csultra01)
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "csultra01".
...

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

n-1<18055> ssi:boot:base:linear: booting n0 (pg1cluster01)
n-1<18055> ssi:boot:base:linear: booting n1 (csultra01)
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "csultra01".
...
-----------------------------------------------------------------------------
n-1<18060> ssi:boot:base:linear: Failed to boot n1 (csultra01)
n-1<18060> ssi:boot:base:linear: aborted!
lamboot did NOT complete successfully

   When I try to lamboot from the Solaris machine, it hangs.

lamboot on Solaris machine
--------------------------------------
$lamboot -v $HOME/lamtopo/Solaris_Linux

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

n-1<1704> ssi:boot:base:linear: booting n0 (csultra01)
n-1<1704> ssi:boot:base:linear: booting n1 (pg1cluster01)
HANGS

   But using the '-d' switch, lamboot works fine. MPI applications
also run successfully.

$ lamboot -d -v ~/lamtopo/csultra01_pg1
< ...lots of diagnostics... >
$ lamnodes
n0 csultra01.ucd.ie:1:origin,this_node
n1 pg1cluster01.ucd.ie:1:

   Could you please let me know if you have experienced this problem
before? Is there any solution?

   Thanks and Regards,
   Ravi.