LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Lars Grabow (grabow_at_[hidden])
Date: 2004-02-12 12:46:47


Hello,

Below is the complete output of 'lamboot -d'. It is actually using 'ssh
-x' but verbose seem sto be set to 1000. Could this cause the trouble?
Other than that it seems fine to me.

Thanks for looking into this!

Lars

-------------------- 'lamboot -d' ---------------------

grabow_at_star47:~/CBE562/mpi_test/HelloWorld>lamboot -d machinefile
n0<4028> ssi:boot: Opening
n0<4028> ssi:boot: opening module globus
n0<4028> ssi:boot: initializing module globus
n0<4028> ssi:boot:globus: globus-job-run not found, globus boot will not
run
n0<4028> ssi:boot: module not available: globus
n0<4028> ssi:boot: opening module rsh
n0<4028> ssi:boot: initializing module rsh
n0<4028> ssi:boot:rsh: module initializing
n0<4028> ssi:boot:rsh:agent: ssh -x
n0<4028> ssi:boot:rsh:username: <same>
n0<4028> ssi:boot:rsh:verbose: 1000
n0<4028> ssi:boot:rsh:algorithm: linear
n0<4028> ssi:boot:rsh:priority: 10
n0<4028> ssi:boot: module available: rsh, priority: 10
n0<4028> ssi:boot: finalizing module globus
n0<4028> ssi:boot:globus: finalizing
n0<4028> ssi:boot: closing module globus
n0<4028> ssi:boot: Selected boot module rsh

LAM 7.0.3/MPI 2 C++/ROMIO - Indiana University

n0<4028> ssi:boot:base: looking for boot schema in following
directories:
n0<4028> ssi:boot:base: <current directory>
n0<4028> ssi:boot:base: $TROLLIUSHOME/etc
n0<4028> ssi:boot:base: $LAMHOME/etc
n0<4028> ssi:boot:base: /usr/local/lam-7.0.3/etc
n0<4028> ssi:boot:base: looking for boot schema file:
n0<4028> ssi:boot:base: machinefile
n0<4028> ssi:boot:base: found boot schema: machinefile
n0<4028> ssi:boot:rsh: found the following hosts:
n0<4028> ssi:boot:rsh: n0 star48 (cpu=1)
n0<4028> ssi:boot:rsh: n1 star47 (cpu=1)
n0<4028> ssi:boot:rsh: resolved hosts:
n0<4028> ssi:boot:rsh: n0 star48 --> 11.0.0.48
n0<4028> ssi:boot:rsh: n1 star47 --> 11.0.0.47 (origin)
n0<4028> ssi:boot:rsh: starting RTE procs
n0<4028> ssi:boot:base:linear: starting
n0<4028> ssi:boot:base:server: opening server TCP socket
n0<4028> ssi:boot:base:server: opened port 33427
n0<4028> ssi:boot:base:linear: booting n0 (star48)
n0<4028> ssi:boot:rsh: starting lamd on (star48)
n0<4028> ssi:boot:rsh: starting on n0 (star48): hboot -t -c
lam-conf.lamd -d -s -I "-H 11.0.0.47 -P 33427 -n 0 -o 1"
n0<4028> ssi:boot:rsh: launching remotely
n0<4028> ssi:boot:rsh: attempting to execute "ssh -x star48 -n echo
$SHELL"
n0<4028> ssi:boot:rsh: remote shell /bin/tcsh
n0<4028> ssi:boot:rsh: attempting to execute "ssh -x star48 -n hboot -t
-c lam-conf.lamd -d -s -I "-H 11.0.0.47 -P 33427 -n 0 -o 1""
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back:
/tmp/lam-grabow_at_[hidden]/lam-killfile
tkill: removing socket file ...
tkill: socket file:
/tmp/lam-grabow_at_[hidden]/lam-kernel-socketd
tkill: removing IO daemon socket file ...
tkill: IO daemon socket file:
/tmp/lam-grabow_at_[hidden]/lam-io-socket
tkill: f_kill =
"/tmp/lam-grabow_at_[hidden]/lam-killfile"
tkill: nothing to kill:
"/tmp/lam-grabow_at_[hidden]/lam-killfile"
hboot: performing tkill
hboot: tkill -d
hboot: booting...
hboot: fork /usr/local/bin/lamd
[1] 20105 lamd -H 11.0.0.47 -P 33427 -n 0 -o 1 -d
n0<4028> ssi:boot:rsh: successfully launched on n0 (star48)
n0<4028> ssi:boot:base:server: expecting connection from finite list
n0<4028> ssi:boot:base:server: got connection from 11.0.0.48
n0<4028> ssi:boot:base:server: this connection is expected (n0)
n0<4028> ssi:boot:base:server: remote lamd is at 11.0.0.48:32862
n0<4028> ssi:boot:base:linear: booting n1 (star47)
n0<4028> ssi:boot:rsh: starting lamd on (star47)
n0<4028> ssi:boot:rsh: starting on n1 (star47): hboot -t -c
lam-conf.lamd -d -I -H 11.0.0.47 -P 33427 -n 1 -o 1
n0<4028> ssi:boot:rsh: launching locally
hboot: performing tkill
hboot: tkill -d
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back:
/tmp/lam-grabow_at_[hidden]/lam-killfile
tkill: removing socket file ...
tkill: socket file:
/tmp/lam-grabow_at_[hidden]/lam-kernel-socketd
tkill: removing IO daemon socket file ...
tkill: IO daemon socket file:
/tmp/lam-grabow_at_[hidden]/lam-io-socket
tkill: f_kill =
"/tmp/lam-grabow_at_[hidden]/lam-killfile"
tkill: nothing to kill:
"/tmp/lam-grabow_at_[hidden]/lam-killfile"
hboot: booting...
hboot: fork /usr/local/bin/lamd
hboot: attempting to execute
[1] 4033 lamd -H 11.0.0.47 -P 33427 -n 1 -o 1 -d
n0<4028> ssi:boot:rsh: successfully launched on n1 (star47)
n0<4028> ssi:boot:base:server: expecting connection from finite list
n-1<4033> ssi:boot: Opening
n-1<4033> ssi:boot: opening module globus
n-1<4033> ssi:boot: initializing module globus
n-1<4033> ssi:boot:globus: globus-job-run not found, globus boot will
not run
n-1<4033> ssi:boot: module not available: globus
n-1<4033> ssi:boot: opening module rsh
n-1<4033> ssi:boot: initializing module rsh
n-1<4033> ssi:boot:rsh: module initializing
n-1<4033> ssi:boot:rsh:agent: ssh -x
n-1<4033> ssi:boot:rsh:username: <same>
n-1<4033> ssi:boot:rsh:verbose: 1000
n-1<4033> ssi:boot:rsh:algorithm: linear
n-1<4033> ssi:boot:rsh:priority: 10
n-1<4033> ssi:boot: module available: rsh, priority: 10
n-1<4033> ssi:boot: finalizing module globus
n-1<4033> ssi:boot:globus: finalizing
n-1<4033> ssi:boot: closing module globus
n-1<4033> ssi:boot: Selected boot module rsh
n0<4028> ssi:boot:base:server: got connection from 11.0.0.47
n0<4028> ssi:boot:base:server: this connection is expected (n1)
n0<4028> ssi:boot:base:server: remote lamd is at 11.0.0.47:32949
n0<4028> ssi:boot:base:server: closing server socket
n0<4028> ssi:boot:base:server: connecting to lamd at 11.0.0.48:33007
n0<4028> ssi:boot:base:server: connected
n0<4028> ssi:boot:base:server: sending number of links (2)
n0<4028> ssi:boot:base:server: sending info: n0 (star48)
n0<4028> ssi:boot:base:server: sending info: n1 (star47)
n0<4028> ssi:boot:base:server: finished sending
n0<4028> ssi:boot:base:server: disconnected from 11.0.0.48:33007
n0<4028> ssi:boot:base:server: connecting to lamd at 11.0.0.47:33430
n0<4028> ssi:boot:base:server: connected
n0<4028> ssi:boot:base:server: sending number of links (2)
n0<4028> ssi:boot:base:server: sending info: n0 (star48)
n0<4028> ssi:boot:base:server: sending info: n1 (star47)
n-1<4033> ssi:boot:rsh: finalizing
n-1<4033> ssi:boot: Closing
n0<4028> ssi:boot:base:server: finished sending
n0<4028> ssi:boot:base:server: disconnected from 11.0.0.47:33430
n0<4028> ssi:boot:base:linear: finished
n0<4028> ssi:boot:rsh: all RTE procs started
n0<4028> ssi:boot:rsh: finalizing
n0<4028> ssi:boot: Closing

> -----Original Message-----
> From: Prashanth [mailto:pcharapa_at_[hidden]]
> Sent: Thursday, February 12, 2004 9:20 AM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: ssh debugging help needed for lam-7.0.3
>
>
> Hello,
>
> Can you attach the entire output from the 'lamboot -d' command?
>
> Thanks.
>
> Prashanth Charapalli,
> LAM/MPI Team.
>
>
> Thus spoke Lars Grabow in the message sent on Wed, 11 Feb 2004
>
> ->Hi all,
> ->
> ->I've successfully installed lam-7.0.3 and am able to run parallel
> programs.
> ->There is only 1 little annoying thing: in all my error files I find
a
> very
> ->lengthy warning for each process about the correct ssh usage:
> ->=====================
> ->n0<877> ssi:boot:base:linear: booting n0 (star47)
> ->n0<877> ssi:boot:base:linear: booting n1 (star48)
> ->n0<877> ssi:boot:base:linear: finished
> ->Usage: ssh [options] host [command]
> ->Options:
> -> -l user Log in using this user name.
> -> -n Redirect input from /dev/null.
> -> -F config Config file (default: ~/.ssh/config).
> -> -A Enable authentication agent forwarding.
> -> -a Disable authentication agent forwarding (default).
> -> -X Enable X11 connection forwarding.
> ->..................
> ->..................(repeated several times)
> ->======================
> ->I set the flag --with-rsh=ssh -x during LAM configuration and
> additionally
> ->tried setting LAMRSH="ssh -x". If I do ssh -x directly to all the
nodes
> ->there's no warning. Is there a way to see what command is actually
> called by
> ->LAM?
> ->
> ->Thanks a lot!
> ->
> ->Lars
> ->
> ->_______________________________________________
> ->This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> ->