LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Martin Dimitrov (slice4e_at_[hidden])
Date: 2003-05-17 22:42:44


I have two machines running RH 9. I was able to run recon, but when I ran lamboot
this is the error that I got. Please help me if you can.
I greatly appreciate your time and help.
Martin

[slice4e_at_cserver slice4e]$ lamboot -vd
 
LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
 
lamboot: boot schema file: /home/slice4e/cluster/LAM/etc/lam-bhost.def
lamboot: opening hostfile /home/slice4e/cluster/LAM/etc/lam-bhost.def
lamboot: found the following hosts:
lamboot: n0 cserver
lamboot: n1 node01
lamboot: resolved hosts:
lamboot: n0 cserver --> 192.168.0.1
lamboot: n1 node01 --> 192.168.0.2
lamboot: found 2 host node(s)
lamboot: origin node is 0 (cserver)
Executing hboot on n0 (cserver - 1 CPU)...
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H 192.168.0.1 -P 33708 -n 0 -o 0 ""
hboot: process schema = "/home/slice4e/cluster/LAM/etc/lam-conf.lam"
hboot: found /home/slice4e/cluster/LAM/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /home/slice4e/cluster/LAM/bin/lamd
hboot: attempting to execute
[1] 5025 lamd -H 192.168.0.1 -P 33708 -n 0 -o 0 -d
Executing hboot on n1 (node01 - 1 CPU)...
lamboot: attempting to execute "ssh -x node01 -n echo $SHELL"
lamboot: got remote shell /bin/bash
lamboot: attempting to execute "ssh -x node01 -n hboot -t -c lam-conf.lam -d -v
-s -I "-H 192.168.0.1 -P 33708 -n 1 -o 0 ""
hboot: process schema = "/home/slice4e/cluster/LAM/etc/lam-conf.lam"
hboot: found /home/slice4e/cluster/LAM/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /home/slice4e/cluster/LAM/bin/lamd
[1] 12491 lamd -H 192.168.0.1 -P 33708 -n 1 -o 0 -d
topology n1... -----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).
 
Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
wipe ...
 
LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
 
Executing tkill on n0 (cserver)...
Executing tkill on n1 (node01)...
lamboot did NOT complete successfully
[slice4e_at_cserver slice4e]$