LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bastian Goldluecke (bg_at_[hidden])
Date: 2003-10-21 01:24:02


On Monday 20 October 2003 23:54, Amey Dharurkar wrote:
> Hi,
> Can you provide some more details (specifically the error which you
> get when the recon fails at 'ssh paris -n echo $SHELL')?
>
> Amey S. Dharurkar

Okay, here is the output of recon -d -v:

<output>
n0<829> ssi:boot: Opening
n0<829> ssi:boot: opening module globus
n0<829> ssi:boot: initializing module globus
n0<829> ssi:boot:globus: globus-job-run not found, globus boot will not run
n0<829> ssi:boot: module not available: globus
n0<829> ssi:boot: opening module rsh
n0<829> ssi:boot: initializing module rsh
n0<829> ssi:boot:rsh: module initializing
n0<829> ssi:boot:rsh:agent: ssh
n0<829> ssi:boot:rsh:username: <same>
n0<829> ssi:boot:rsh:verbose: 1000
n0<829> ssi:boot:rsh:algorithm: linear
n0<829> ssi:boot:rsh:priority: 10
n0<829> ssi:boot: module available: rsh, priority: 10
n0<829> ssi:boot: finalizing module globus
n0<829> ssi:boot:globus: finalizing
n0<829> ssi:boot: closing module globus
n0<829> ssi:boot: Selected boot module rsh
n0<829> ssi:boot:base: looking for boot schema in following directories:
n0<829> ssi:boot:base: <current directory>
n0<829> ssi:boot:base: $TROLLIUSHOME/etc
n0<829> ssi:boot:base: $LAMHOME/etc
n0<829> ssi:boot:base: /usr/local/etc
n0<829> ssi:boot:base: looking for boot schema file:
n0<829> ssi:boot:base: lamhosts
n0<829> ssi:boot:base: found boot schema: lamhosts
n0<829> ssi:boot:rsh: found the following hosts:
n0<829> ssi:boot:rsh: n0 139.19.50.1 (cpu=2)
n0<829> ssi:boot:rsh: n1 mpiat5300 (cpu=1)
n0<829> ssi:boot:rsh: n2 mpiat5301 (cpu=1)
n0<829> ssi:boot:rsh: n3 mpiat5210 (cpu=1)
n0<829> ssi:boot:rsh: n4 mpiat5202 (cpu=1)
n0<829> ssi:boot:rsh: n5 mpiat5304 (cpu=1)
n0<829> ssi:boot:rsh: n6 paris (cpu=1)
n0<829> ssi:boot:rsh: resolved hosts:
n0<829> ssi:boot:rsh: n0 139.19.50.1 --> 139.19.50.1 (origin)
n0<829> ssi:boot:rsh: n1 mpiat5300 --> 139.19.50.18
n0<829> ssi:boot:rsh: n2 mpiat5301 --> 139.19.50.19
n0<829> ssi:boot:rsh: n3 mpiat5210 --> 139.19.50.17
n0<829> ssi:boot:rsh: n4 mpiat5202 --> 139.19.50.7
n0<829> ssi:boot:rsh: n5 mpiat5304 --> 139.19.50.26
n0<829> ssi:boot:rsh: n6 paris --> 139.19.3.40
n0<829> ssi:boot:rsh: starting RTE procs
n0<829> ssi:boot:base:linear: starting
n0<829> ssi:boot:base:linear: booting n0 (139.19.50.1)
n0<829> ssi:boot:rsh: starting recon on (139.19.50.1)
n0<829> ssi:boot:rsh: starting on n0 (139.19.50.1): tkill -N -d -v
n0<829> ssi:boot:rsh: launching locally
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back: /tmp/lam-bg_at_mpiat5100/lam-killfile
tkill: removing socket file ...
tkill: removing IO daemon socket file ...
tkill: f_kill = "/tmp/lam-bg_at_mpiat5100/lam-killfile"
tkill: nothing to kill: "/tmp/lam-bg_at_mpiat5100/lam-killfile"
n0<829> ssi:boot:rsh: successfully launched on n0 (139.19.50.1)

... other nodes successfully launched ...

n0<829> ssi:boot:base:linear: booting n6 (paris)
n0<829> ssi:boot:rsh: starting recon on (paris)
n0<829> ssi:boot:rsh: starting on n6 (paris): tkill -N -d -v
n0<829> ssi:boot:rsh: launching remotely
n0<829> ssi:boot:rsh: attempting to execute "ssh paris -n echo $SHELL"
-----------------------------------------------------------------------------
LAM failed to execute a process on the remote node "paris".
LAM was not trying to invoke any LAM-specific commands yet -- we were
simply trying to determine what shell was being used on the remote
host.

LAM tried to use the remote agent command "ssh"
to invoke "echo $SHELL" on the remote node.

This usually indicates an authentication problem with the remote
agent, or some other configuration type of error in your .cshrc or
.profile file. The following is a list of items that you may wish to
check on the remote node:
</output>

The description of some common problems follows, I think I did everything
right, since I can execute the command manually in a shell.

Here is the laminfo of the linux nodes:

           LAM/MPI: 7.0.2
            Prefix: /usr/local
      Architecture: i686-pc-linux-gnu
     Configured by: bg
     Configured on: Thu Oct 2 14:33:07 CEST 2003
    Configure host: mpiat5100
        C bindings: yes
      C++ bindings: yes
  Fortran bindings: yes
       C profiling: yes
     C++ profiling: yes
 Fortran profiling: yes
     ROMIO support: yes
      IMPI support: no
     Debug support: no
      Purify clean: no
          SSI boot: globus (Module v0.5)
          SSI boot: rsh (Module v1.0)
          SSI coll: lam_basic (Module v7.0)
          SSI coll: smp (Module v1.0)
           SSI rpi: crtcp (Module v1.0)
           SSI rpi: lamd (Module v7.0)
           SSI rpi: sysv (Module v7.0)
           SSI rpi: tcp (Module v7.0)
           SSI rpi: usysv (Module v7.0)

and of paris:

           LAM/MPI: 7.0.2
            Prefix: /NWG/MM/usr/bg/baselib-solaris
      Architecture: sparc-sun-solaris2.9
     Configured by: bg
     Configured on: Wed Oct 8 17:02:17 MEST 2003
    Configure host: paris
        C bindings: yes
      C++ bindings: yes
  Fortran bindings: yes
       C profiling: yes
     C++ profiling: yes
 Fortran profiling: yes
     ROMIO support: yes
      IMPI support: no
     Debug support: no
      Purify clean: no
          SSI boot: globus (Module v0.5)
          SSI boot: rsh (Module v1.0)
          SSI coll: lam_basic (Module v7.0)
          SSI coll: smp (Module v1.0)
           SSI rpi: crtcp (Module v1.0)
           SSI rpi: lamd (Module v7.0)
           SSI rpi: sysv (Module v7.0)
           SSI rpi: tcp (Module v7.0)
           SSI rpi: usysv (Module v7.0)

- Bastian.