LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Mahesh Salunkhe (mahesh.salunkhe_at_[hidden])
Date: 2009-04-30 06:16:52


Hello !
  I' ve installed lam-6.5.9-1.i386.rpm on my cluster of two machines:

192.168.10.130

192.168.10.129
  (on one machine redhat enterprise linux 3 is installed and on the other
redhat enterprise linux 4)
  recon is running successfully but lamboot is giving problem. I'm pasting
here the output of the command : lamboot -d

[mss_at_mss ~]$ lamboot -d

LAM 6.5.9/MPI 2 C++ - Indiana University

lamboot: boot schema file: /etc/lam/lam-bhost.def
lamboot: opening hostfile /etc/lam/lam-bhost.def
lamboot: found the following hosts:
lamboot: n0 192.168.10.130
lamboot: n1 192.168.10.129
lamboot: resolved hosts:
lamboot: n0 192.168.10.130 --> 192.168.10.130
lamboot: n1 192.168.10.129 --> 192.168.10.129
lamboot: found 2 host node(s)
lamboot: origin node is 0 (192.168.10.130)
lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -I " -H
192.168.10.130 -P 33130 -n 0 -o 0 ""
hboot: process schema = "/etc/lam/lam-conf.lam"
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/bin/lamd
hboot: attempting to execute
[1] 4832 lamd -H 192.168.10.130 -P 33130 -n 0 -o 0 -d
lamboot: attempting to execute "/usr/bin/ssh -x -a 192.168.10.129 -n echo
$SHELL"
lamboot: got remote shell /bin/bash
lamboot: attempting to execute "/usr/bin/ssh -x -a 192.168.10.129 -n hboot
-t -c lam-conf.lam -d -s -I "-H 192.168.10.130 -P 33130 -n 1 -o 0 ""
hboot: process schema = "/etc/lam/lam-conf.lam"
hboot: found /usr/bin/lamd
hboot: performing tkill
hboot: tkill
hboot: booting...
hboot: fork /usr/bin/lamd
[1] 4858 lamd -H 192.168.10.130 -P 33130 -n 1 -o 0 -d
-----------------------------------------------------------------------------
lamboot encountered some error (see above) during the boot process,
and will now attempt to kill all nodes that it was previously able to
boot (if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
-----------------------------------------------------------------------------
wipe ...

LAM 6.5.9/MPI 2 C++ - Indiana University

Executing tkill on n0 (192.168.10.130)...
Executing tkill on n1 (192.168.10.129)...
lamboot did NOT complete successfully

Could u please tell me what is the error?

Actually the problem arises when hboot is being called on the remote
machines.
I tried to run the hboot command on the remote machine locally.The error
given while running the command is :
            kernel not found

            which is the first command in the /etc/lam/lam-conf.otb

-- 
Regards
Mahesh