LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Susanne Hemker (shemker2_at_[hidden])
Date: 2004-12-01 08:47:49


Hi everybody,
I am trying to start lam, but it won't finish the boot process. Since
"recon -v boot_schema_file" did work fine, I have changed the boot
schema faile to the IP addresse instead of the node names,as recommende
in the FAQ, but this did not resolve the problem. Can anybody tell me
what I might have done wrong or need to change in order to get lam to
boot?

Here's the output from the boot attempt:

n65(15)% lamboot -d -v boot_schema_file

LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University

lamboot: boot schema file: boot_schema_file
lamboot: opening hostfile boot_schema_file
lamboot: found the following hosts:
lamboot: n0 192.168.0.65
lamboot: n1 192.168.0.66
lamboot: n2 192.168.0.67
lamboot: n3 192.168.0.68
lamboot: resolved hosts:
lamboot: n0 192.168.0.65 --> 192.168.0.65
lamboot: n1 192.168.0.66 --> 192.168.0.66
lamboot: n2 192.168.0.67 --> 192.168.0.67
lamboot: n3 192.168.0.68 --> 192.168.0.68
lamboot: found 4 host node(s)
lamboot: origin node is 0 (192.168.0.65)
Executing hboot on n0 (192.168.0.65 - 1 CPU)...
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H
192.168.0.65 -P 32807 -n 0 -o 0 ""
hboot: process schema = "/opt/lam-6.5.7/etc/lam-conf.lam"
hboot: found /opt/lam-6.5.7/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /opt/lam-6.5.7/bin/lamd
[1] 1628 lamd -H 192.168.0.65 -P 32807 -n 0 -o 0 -d
hboot: attempting to execute
Executing hboot on n1 (192.168.0.66 - 1 CPU)...
lamboot: attempting to execute "/usr/bin/ssh 192.168.0.66 -n echo
$SHELL"
lamboot: got remote shell /bin/tcsh
lamboot: attempting to execute "/usr/bin/ssh 192.168.0.66 -n hboot -t
-c lam-conf.lam -d -v -s -I "-H 192.168.0.65 -P 32807 -n 1 -o 0 ""
hboot: process schema = "/opt/lam-6.5.7/etc/lam-conf.lam"
hboot: found /opt/lam-6.5.7/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /opt/lam-6.5.7/bin/lamd
[1] 25996 lamd -H 192.168.0.65 -P 32807 -n 1 -o 0 -d
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
wipe ...

LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University

Executing tkill on n0 (192.168.0.65)...
Executing tkill on n1 (192.168.0.66)...
lamboot did NOT complete successfully

Thanks for any suggestions,

Susanne