LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: GosselinA_at_[hidden]
Date: 2003-09-12 08:43:36


Thanks for your help.

I configured LAM-MPI version 7.0 with "--with-rpi=ssh".

Here are excerpts from 'lamboot' output when executed with the '-d' option.
I give only output related to one node (n1) inside my host file. It is
identical for all others.

1) % lamboot -d hostfile
...
n-1<13309> ssi:boot: Opening
n-1<13309> ssi:boot: opening module globus
n-1<13309> ssi:boot: initializing module globus
n-1<13309> ssi:boot:globus: globus-job-run not found, globus boot will not
run
n-1<13309> ssi:boot: module not available: globus
n-1<13309> ssi:boot: opening module rsh
n-1<13309> ssi:boot: initializing module rsh
n-1<13309> ssi:boot:rsh: module initializing
n-1<13309> ssi:boot:rsh:agent: ssh
n-1<13309> ssi:boot:rsh:username: <same>
n-1<13309> ssi:boot:rsh:verbose: 1000
n-1<13309> ssi:boot:rsh:algorithm: linear
n-1<13309> ssi:boot:rsh:priority: 10
n-1<13309> ssi:boot: module available: rsh, priority: 10
n-1<13309> ssi:boot: finalizing module globus
n-1<13309> ssi:boot:globus: finalizing
n-1<13309> ssi:boot: closing module globus
n-1<13309> ssi:boot: Selected boot module rsh
n0<13306> ssi:boot:base:server: got connection from 142.130.48.15
n0<13306> ssi:boot:base:server: this connection is expected (n0)
n0<13306> ssi:boot:base:server: remote lamd is at 142.130.48.15:6273
n0<13306> ssi:boot:base:linear: booting n1 (nordet.qc.dfo.ca)
n0<13306> ssi:boot:rsh: starting lamd on (nordet.qc.dfo.ca)
n0<13306> ssi:boot:rsh: starting on n1 (nordet.qc.dfo.ca): hboot -t -c
lam-conf.lamd -d -s -I "-H 142.130.48.15 -P 43343 -n 1 -o 0"
n0<13306> ssi:boot:rsh: launching remotely
n0<13306> ssi:boot:rsh: -b used, assuming same shell on remote nodes
n0<13306> ssi:boot:rsh: got local shell /bin/bash
n0<13306> ssi:boot:rsh: attempting to execute "ssh nordet.qc.dfo.ca -n hboot
-t -c lam-conf.l amd -d -s -I "-H 142.130.48.15 -P 43343 -n 1 -o 0""
Enter passphrase for key '/home/gosselin_a/.ssh/id_dsa':

We have "rsh:agent: ssh" as expected, and an ssh conection is attempted on
node 1.

2) Now I set the rsh agent through "LAMRSH" and run lamboot again (following
a 'lamhalt' of course).
    % export LAMRSH=rsh
    % lamboot -d hostfile
...
n-1<13317> ssi:boot: Opening
n-1<13317> ssi:boot: opening module globus
n-1<13317> ssi:boot: initializing module globus
n-1<13317> ssi:boot:globus: globus-job-run not found, globus boot will not
run
n-1<13317> ssi:boot: module not available: globus
n-1<13317> ssi:boot: opening module rsh
n-1<13317> ssi:boot: initializing module rsh
n-1<13317> ssi:boot:rsh: module initializing
n-1<13317> ssi:boot:rsh:agent: rsh
n-1<13317> ssi:boot:rsh:username: <same>
n-1<13317> ssi:boot:rsh:verbose: 1000
n-1<13317> ssi:boot:rsh:algorithm: linear
n-1<13317> ssi:boot:rsh:priority: 10
n-1<13317> ssi:boot: module available: rsh, priority: 10
n-1<13317> ssi:boot: finalizing module globus
n-1<13317> ssi:boot:globus: finalizing
n-1<13317> ssi:boot: closing module globus
n-1<13317> ssi:boot: Selected boot module rsh
n0<13314> ssi:boot:base:server: got connection from 142.130.48.15
n0<13314> ssi:boot:base:server: this connection is expected (n0)
n0<13314> ssi:boot:base:server: remote lamd is at 142.130.48.15:6785
n0<13314> ssi:boot:base:linear: booting n1 (nordet.qc.dfo.ca)
n0<13314> ssi:boot:rsh: starting lamd on (nordet.qc.dfo.ca)
n0<13314> ssi:boot:rsh: starting on n1 (nordet.qc.dfo.ca): hboot -t -c
lam-conf.lamd -d -s -I "-H 142.130.48.15 -P 43348 -n 1 -o 0"
n0<13314> ssi:boot:rsh: launching remotely
n0<13314> ssi:boot:rsh: -b used, assuming same shell on remote nodes
n0<13314> ssi:boot:rsh: got local shell /bin/bash
n0<13314> ssi:boot:rsh: attempting to execute "rsh nordet.qc.dfo.ca -n hboot
-t -c lam-conf.lamd -d -s -I "-H 142.130.48.15 -P 43348 -n 1 -o 0""
...

We get "rsh:agent: rsh" , rsh is the agent as expected, and an rsh
connection is attempted.

3) Now I try the (supposedly) equivalent "-ssi boot_rsh_agent rsh" command
line option, after unsetting the LAMRSH env var and a 'lamhalt'.

    % lamboot -d -ssi boot_rsh_agent rsh hostfile

n-1<13340> ssi:boot: Opening
n-1<13340> ssi:boot: opening module globus
n-1<13340> ssi:boot: initializing module globus
n-1<13340> ssi:boot:globus: globus-job-run not found, globus boot will not
run
n-1<13340> ssi:boot: module not available: globus
n-1<13340> ssi:boot: opening module rsh
n-1<13340> ssi:boot: initializing module rsh
n-1<13340> ssi:boot:rsh: module initializing
n-1<13340> ssi:boot:rsh:agent: rsh
n-1<13340> ssi:boot:rsh:username: <same>
n-1<13340> ssi:boot:rsh:verbose: 1000
n-1<13340> ssi:boot:rsh:algorithm: linear
n-1<13340> ssi:boot:rsh:priority: 10
n-1<13340> ssi:boot: module available: rsh, priority: 10
n-1<13340> ssi:boot: finalizing module globus
n-1<13340> ssi:boot:globus: finalizing
n-1<13340> ssi:boot: closing module globus
n-1<13340> ssi:boot: Selected boot module rsh
n0<13337> ssi:boot:base:server: got connection from 142.130.48.15
n0<13337> ssi:boot:base:server: this connection is expected (n0)
n0<13337> ssi:boot:base:server: remote lamd is at 142.130.48.15:8321
n0<13337> ssi:boot:base:linear: booting n1 (nordet.qc.dfo.ca)
n0<13337> ssi:boot:rsh: starting lamd on (nordet.qc.dfo.ca)
n0<13337> ssi:boot:rsh: starting on n1 (nordet.qc.dfo.ca): hboot -t -c
lam-conf.lamd -d -s -I "-H 142.130.48.15 -P 43364 -n 1 -o 0"
n0<13337> ssi:boot:rsh: launching remotely
n0<13337> ssi:boot:rsh: -b used, assuming same shell on remote nodes
n0<13337> ssi:boot:rsh: got local shell /bin/bash
n0<13337> ssi:boot:rsh: attempting to execute "ssh nordet.qc.dfo.ca -n hboot
-t -c lam-conf.lamd -d -s -I "-H 142.130.48.15 -P 43364 -n 1 -o 0""

As expected, we get "rsh:agent: rsh", so rsh should be the agent. But
surprisingly, an ssh connection is attempted afterwards.

It seems that the command-line setting is not honoured at all. I spent a few
moments looking at the code. The output 'attempting to execute "ssh
nordet.qc.dfo.ca ...' comes from a function inside file
'ssi_boot_rsh_initexec.c', and the text of the command is initialized by a
call to function 'add_rsh()'. This function looks only for LAMRSH env var,
or the compiled default. Unless I am wrong, this seems to explain the
present behavior.

But you say you do not experience this problem. So what version are you
running exactly?

Again, here what 'laminfo' gives for me:
           LAM/MPI: 7.0
            Prefix: /usr
      Architecture: i586-mandrake-linux-gnu
     Configured by: root
     Configured on: Wed Sep 10 20:56:04 EDT 2003
    Configure host: euclide.qc.dfo.ca
        C bindings: yes
      C++ bindings: yes
  Fortran bindings: yes
       C profiling: yes
     C++ profiling: yes
 Fortran profiling: yes
     ROMIO support: yes
      IMPI support: no
     Debug support: no
      Purify clean: no
          SSI boot: globus (Module v0.5)
          SSI boot: rsh (Module v1.0)
          SSI coll: lam_basic (Module v7.0)
          SSI coll: smp (Module v1.0)
           SSI rpi: crtcp (Module v1.0)
           SSI rpi: lamd (Module v7.0)
           SSI rpi: sysv (Module v7.0)
           SSI rpi: tcp (Module v7.0)
           SSI rpi: usysv (Module v7.0)

Regards
-----Message d'origine-----
De : Jeff Squyres [mailto:jsquyres_at_[hidden]]
Envoyé : Thursday, September 11, 2003 18:17
À : General LAM/MPI mailing list
Objet : Re: LAM: LAMRSH vs boot_rsh_agent : boot_rsh_agent seems not
functionnal

On Thu, 11 Sep 2003 GosselinA_at_[hidden] wrote:

> I compiler lam-mpi 7.0 with "--with-rsh=ssh" to use ssh as the
> default. But sometimes I would like to revert to "rsh" to boot.
>
> If I do "export LAMRSH=rsh" , lamboot connects using rsh as expected,
> using ".rhosts" files on my nodes.

Good.

> But if I try this:
>
> % lamboot -v -ssi boot_rsh_agent rsh hostfile
>
> lamboot proceeds as if ssh was still my agent (asking for pass phrases
> for ex. if no ssh-agent has been loaded with my private key).

Can you look at the output of "lamboot -d -ssi boot_rsh_agent rsh hostfile"?
That should tell you what LAM is running as its underlying remote agent
command. I just tried "-ssi boot_rsh_agent rsh" myself and it overrode the
underlying command as it should.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/ _______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/


  • application/ms-tnef attachment: stored