Hi,
I have LAM-7.1.1 installed on a Solaris and a Linux machine. The
installation details are shown below:
Linux machine
---------------------
$uname -a
Linux pg1cluster01 2.6.8-1.521smp #1 SMP Mon Aug 16 09:25:06 EDT 2004
i686 i686 i386 GNU/Linux
$laminfo
LAM/MPI: 7.1.1
Prefix:
/home/cs/manredd/lam-7.1.1/lam-7.1.1/LAM-Linux-2.6.8-1.521smp
Architecture: i686-pc-linux-gnu
Configured by: manredd
Configured on: Mon Nov 1 10:50:20 GMT 2004
Configure host: pg1cluster01
Memory manager: ptmalloc2
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C compiler: gcc
C++ compiler: g++
Fortran compiler: g77
Fortran symbols: double_underscore
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
C++ exceptions: no
Thread support: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (API v1.1, Module v0.6)
SSI boot: rsh (API v1.1, Module v1.1)
SSI boot: slurm (API v1.1, Module v1.0)
SSI coll: lam_basic (API v1.1, Module v7.1)
SSI coll: shmem (API v1.1, Module v1.0)
SSI coll: smp (API v1.1, Module v1.2)
SSI rpi: crtcp (API v1.1, Module v1.1)
SSI rpi: lamd (API v1.0, Module v7.1)
SSI rpi: sysv (API v1.0, Module v7.1)
SSI rpi: tcp (API v1.0, Module v7.1)
SSI rpi: usysv (API v1.0, Module v7.1)
SSI cr: self (API v1.0, Module v1.0)
Solaris machine
---------------------
$uname -a
SunOS csultra01 5.9 Generic_112233-10 sun4u sparc SUNW,Ultra-5_10
$laminfo
LAM/MPI: 7.1.1
Prefix: /home/cs/manredd/lam-7.1.1/lam-7.1.1/LAM-SunOS-5.9
Architecture: sparc-sun-solaris2.9
Configured by:
Configured on: Tue Nov 2 14:19:31 GMT 2004
Configure host: csultra01
Memory manager: none
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C compiler: gcc
C++ compiler: g++
Fortran compiler: g77
Fortran symbols: double_underscore
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
C++ exceptions: no
Thread support: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (API v1.1, Module v0.6)
SSI boot: rsh (API v1.1, Module v1.1)
SSI boot: slurm (API v1.1, Module v1.0)
SSI coll: lam_basic (API v1.1, Module v7.1)
SSI coll: shmem (API v1.1, Module v1.0)
SSI coll: smp (API v1.1, Module v1.2)
SSI rpi: crtcp (API v1.1, Module v1.1)
SSI rpi: lamd (API v1.0, Module v7.1)
SSI rpi: sysv (API v1.0, Module v7.1)
SSI rpi: tcp (API v1.0, Module v7.1)
SSI rpi: usysv (API v1.0, Module v7.1)
SSI cr: self (API v1.0, Module v1.0)
I have a lamboot file which includes both the machines.
$cat $HOME/lamtopo/Linux_Solaris
pg1cluster01
csultra01
'ssh' works fine between the two machines and is set up to not
prompt for the password.
When I try to lamboot from the Linux machine, I get the error:
lamboot on Linux machine
--------------------------------------
$lamboot -v $HOME/lamtopo/Linux_Solaris
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
n-1<18055> ssi:boot:base:linear: booting n0 (pg1cluster01)
n-1<18055> ssi:boot:base:linear: booting n1 (csultra01)
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "csultra01".
...
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
n-1<18055> ssi:boot:base:linear: booting n0 (pg1cluster01)
n-1<18055> ssi:boot:base:linear: booting n1 (csultra01)
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "csultra01".
...
-----------------------------------------------------------------------------
n-1<18060> ssi:boot:base:linear: Failed to boot n1 (csultra01)
n-1<18060> ssi:boot:base:linear: aborted!
lamboot did NOT complete successfully
When I try to lamboot from the Solaris machine, it hangs.
lamboot on Solaris machine
--------------------------------------
$lamboot -v $HOME/lamtopo/Solaris_Linux
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
n-1<1704> ssi:boot:base:linear: booting n0 (csultra01)
n-1<1704> ssi:boot:base:linear: booting n1 (pg1cluster01)
HANGS
But using the '-d' switch, lamboot works fine. MPI applications
also run successfully.
$ lamboot -d -v ~/lamtopo/csultra01_pg1
< ...lots of diagnostics... >
$ lamnodes
n0 csultra01.ucd.ie:1:origin,this_node
n1 pg1cluster01.ucd.ie:1:
Could you please let me know if you have experienced this problem
before? Is there any solution?
Thanks and Regards,
Ravi.
|