This is a classical issue due to your fire-wall:
check the archive
Martin Dimitrov wrote:
> I have two machines running RH 9. I was able to run recon, but when I
> ran lamboot
> this is the error that I got. Please help me if you can.
> I greatly appreciate your time and help.
> Martin
>
> [slice4e_at_cserver slice4e]$ lamboot -vd
>
> LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
>
> lamboot: boot schema file: /home/slice4e/cluster/LAM/etc/lam-bhost.def
> lamboot: opening hostfile /home/slice4e/cluster/LAM/etc/lam-bhost.def
> lamboot: found the following hosts:
> lamboot: n0 cserver
> lamboot: n1 node01
> lamboot: resolved hosts:
> lamboot: n0 cserver --> 192.168.0.1
> lamboot: n1 node01 --> 192.168.0.2
> lamboot: found 2 host node(s)
> lamboot: origin node is 0 (cserver)
> Executing hboot on n0 (cserver - 1 CPU)...
> lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H
> 192.168.0.1 -P 33708 -n 0 -o 0 ""
> hboot: process schema = "/home/slice4e/cluster/LAM/etc/lam-conf.lam"
> hboot: found /home/slice4e/cluster/LAM/bin/lamd
> hboot: performing tkill
> hboot: tkill
> hboot: booting...
> hboot: fork /home/slice4e/cluster/LAM/bin/lamd
> hboot: attempting to execute
> [1] 5025 lamd -H 192.168.0.1 -P 33708 -n 0 -o 0 -d
> Executing hboot on n1 (node01 - 1 CPU)...
> lamboot: attempting to execute "ssh -x node01 -n echo $SHELL"
> lamboot: got remote shell /bin/bash
> lamboot: attempting to execute "ssh -x node01 -n hboot -t -c
> lam-conf.lam -d -v
> -s -I "-H 192.168.0.1 -P 33708 -n 1 -o 0 ""
> hboot: process schema = "/home/slice4e/cluster/LAM/etc/lam-conf.lam"
> hboot: found /home/slice4e/cluster/LAM/bin/lamd
> hboot: performing tkill
> hboot: tkill
> hboot: booting...
> hboot: fork /home/slice4e/cluster/LAM/bin/lamd
> [1] 12491 lamd -H 192.168.0.1 -P 33708 -n 1 -o 0 -d
> topology n1...
> -----------------------------------------------------------------------------
> lamboot encountered some error (see above) during the boot process,
> and will now attempt to kill all nodes that it was previously able to
> boot (if any).
>
> Please wait for LAM to finish; if you interrupt this process, you may
> have LAM daemons still running on remote nodes.
> -----------------------------------------------------------------------------
> wipe ...
>
> LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
>
> Executing tkill on n0 (cserver)...
> Executing tkill on n1 (node01)...
> lamboot did NOT complete successfully
> [slice4e_at_cserver slice4e]$
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|