LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: David Shattuck (shattuck_at_[hidden])
Date: 2002-08-31 20:14:04


Hi -

I am trying to boot a lam cluster with two machines. One of these cannot
lamboot itself. When I try, I get a error message with no description of
the error. Any idea what could be causing this? I have included the
output of both "lamboot" and "lamboot -d -v" below. SSH to the machine
works fine, and I have LAMRSH set to "ssh -x".

thanks,
David Shattuck
UCLA Laboratory of Neuro Imaging

[glitch_at_wulfpet3 glitch]$ lamboot

LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame

-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------

LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame

[glitch_at_wulfpet3 glitch]$ lamboot -d -v

LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame

lamboot: boot schema file: /etc/lam/lam-bhost.def
lamboot: opening hostfile /etc/lam/lam-bhost.def
lamboot: found the following hosts:
lamboot: n0 localhost
lamboot: resolved hosts:
lamboot: n0 localhost --> 127.0.0.1
lamboot: found 1 host node(s)
lamboot: origin node is 0 (localhost)
Executing hboot on n0 (localhost - 1 CPU)...
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H
127.0.0.1 -P 32835 -n 0 -o 0 ""
hboot: process schema = "/etc/lam/lam-conf.lam"
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/bin/lamd
[1] 10980 lamd -H 127.0.0.1 -P 32835 -n 0 -o 0 -d
hboot: attempting to execute
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
wipe ...

LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame

Executing tkill on n0 (localhost)...
lamboot did NOT complete successfully
[glitch_at_wulfpet3 glitch]$

_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/