Hi -
I am trying to boot a lam cluster with two machines. One of these cannot
lamboot itself. When I try, I get a error message with no description of
the error. Any idea what could be causing this? I have included the
output of both "lamboot" and "lamboot -d -v" below. SSH to the machine
works fine, and I have LAMRSH set to "ssh -x".
thanks,
David Shattuck
UCLA Laboratory of Neuro Imaging
[glitch_at_wulfpet3 glitch]$ lamboot
LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).
Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
[glitch_at_wulfpet3 glitch]$ lamboot -d -v
LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
lamboot: boot schema file: /etc/lam/lam-bhost.def
lamboot: opening hostfile /etc/lam/lam-bhost.def
lamboot: found the following hosts:
lamboot: n0 localhost
lamboot: resolved hosts:
lamboot: n0 localhost --> 127.0.0.1
lamboot: found 1 host node(s)
lamboot: origin node is 0 (localhost)
Executing hboot on n0 (localhost - 1 CPU)...
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H
127.0.0.1 -P 32835 -n 0 -o 0 ""
hboot: process schema = "/etc/lam/lam-conf.lam"
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/bin/lamd
[1] 10980 lamd -H 127.0.0.1 -P 32835 -n 0 -o 0 -d
hboot: attempting to execute
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).
Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
wipe ...
LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
Executing tkill on n0 (localhost)...
lamboot did NOT complete successfully
[glitch_at_wulfpet3 glitch]$
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|