LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Sayek, Ogan (osayek_at_[hidden])
Date: 2003-11-05 09:37:29


 
For some reason lamboot is failing on me in the bottom.
Any ideas what's causing it?
 
$ lamboot -d -v lam-bhost.def
 
LAM 6.5.8/MPI 2 C++/ROMIO - Indiana University
 
lamboot: boot schema file: lam-bhost.def
lamboot: opening hostfile lam-bhost.def
lamboot: found the following hosts:
lamboot: n0 oscarnode1.oscardomain
lamboot: n1 oscarnode2.oscardomain
lamboot: n2 linsrv
lamboot: resolved hosts:
lamboot: n0 oscarnode1.oscardomain --> 10.1.0.11
lamboot: n1 oscarnode2.oscardomain --> 10.1.0.12
lamboot: n2 linsrv --> 10.1.0.10
lamboot: found 3 host node(s)
lamboot: origin node is 2 (linsrv)
Executing hboot on n0 (oscarnode1.oscardomain - 1 CPU)...
lamboot: attempting to execute "/usr/bin/ssh -x -a
oscarnode1.oscardomain -n echo $SHELL"
lamboot: got remote shell /bin/bash
lamboot: attempting to execute "/usr/bin/ssh -x -a
oscarnode1.oscardomain -n hboot -t -c lam-conf.lam -d -v -s -I "-H
10.1.0.10 -P 38618 -n 0 -o 2 ""
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back:
/tmp/lam-osayek_at_oscarnode1.oscardomain/lam-killfile
tkill: removing socket file ...
tkill: socket file:
/tmp/lam-osayek_at_oscarnode1.oscardomain/lam-kernel-socketd
tkill: removing IO daemon socket file ...
tkill: IO daemon socket file:
/tmp/lam-osayek_at_oscarnode1.oscardomain/lam-io-socket
tkill: f_kill = "/tmp/lam-osayek_at_oscarnode1.oscardomain/lam-killfile"
tkill: nothing to kill:
"/tmp/lam-osayek_at_oscarnode1.oscardomain/lam-killfile"
hboot: performing tkill
hboot: tkill -d
hboot: booting...
hboot: fork /opt/lam-7.0/bin/lamd
[1] 4915 lamd -H 10.1.0.10 -P 38618 -n 0 -o 2 -d
Executing hboot on n1 (oscarnode2.oscardomain - 1 CPU)...
lamboot: attempting to execute "/usr/bin/ssh -x -a
oscarnode2.oscardomain -n echo $SHELL"
lamboot: got remote shell /bin/bash
lamboot: attempting to execute "/usr/bin/ssh -x -a
oscarnode2.oscardomain -n hboot -t -c lam-conf.lam -d -v -s -I "-H
10.1.0.10 -P 38618 -n 1 -o 2 ""
tkill: setting prefix to (null)
tkill: setting suffix to (null)
tkill: got killname back:
/tmp/lam-osayek_at_oscarnode2.oscardomain/lam-killfile
tkill: removing socket file ...
tkill: socket file:
/tmp/lam-osayek_at_oscarnode2.oscardomain/lam-kernel-socketd
tkill: removing IO daemon socket file ...
tkill: IO daemon socket file:
/tmp/lam-osayek_at_oscarnode2.oscardomain/lam-io-socket
tkill: f_kill = "/tmp/lam-osayek_at_oscarnode2.oscardomain/lam-killfile"
tkill: nothing to kill:
"/tmp/lam-osayek_at_oscarnode2.oscardomain/lam-killfile"
hboot: performing tkill
hboot: tkill -d
hboot: booting...
hboot: fork /opt/lam-7.0/bin/lamd
[1] 4425 lamd -H 10.1.0.10 -P 38618 -n 1 -o 2 -d
Executing hboot on n2 (linsrv - 1 CPU)...
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H
10.1.0.10 -P 38618 -n 2 -o 2 ""
hboot: process schema = "lam-conf.lam"
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/bin/lamd
hboot: attempting to execute
[1] 23143 lamd -H 10.1.0.10 -P 38618 -n 2 -o 2 -d
topology n0...
------------------------------------------------------------------------
-----
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).
 
Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
------------------------------------------------------------------------
-----
wipe ...
 
LAM 6.5.8/MPI 2 C++/ROMIO - Indiana University
 
Executing tkill on n0 (oscarnode1.oscardomain)...
Executing tkill on n1 (oscarnode2.oscardomain)...
Executing tkill on n2 (linsrv)...
lamboot did NOT complete successfully
 
--------------------------------------
OGAN SAYEK
Systems Engineer
Towson University
Computing and Network Services (CANS)
Email: osayek_at_[hidden]
Phone: 410.704.4256