Thank you very much. I have 2 eth cards. So I made one of then a trusted
devise on both machines and it works. (the one not connected to the
internet.
----- Original Message -----
From: "Jerome BENOIT" <jgmbenoit_at_[hidden]>
To: "General LAM/MPI mailing list" <lam_at_[hidden]>
Sent: ??????, ??? 18, 2003 4:47 AM
Subject: Re: LAM: recon works, but lamboot doesnt. Please help
> This is a classical issue due to your fire-wall:
> check the archive
>
> Martin Dimitrov wrote:
> > I have two machines running RH 9. I was able to run recon, but when I
> > ran lamboot
> > this is the error that I got. Please help me if you can.
> > I greatly appreciate your time and help.
> > Martin
> >
> > [slice4e_at_cserver slice4e]$ lamboot -vd
> >
> > LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
> >
> > lamboot: boot schema file: /home/slice4e/cluster/LAM/etc/lam-bhost.def
> > lamboot: opening hostfile /home/slice4e/cluster/LAM/etc/lam-bhost.def
> > lamboot: found the following hosts:
> > lamboot: n0 cserver
> > lamboot: n1 node01
> > lamboot: resolved hosts:
> > lamboot: n0 cserver --> 192.168.0.1
> > lamboot: n1 node01 --> 192.168.0.2
> > lamboot: found 2 host node(s)
> > lamboot: origin node is 0 (cserver)
> > Executing hboot on n0 (cserver - 1 CPU)...
> > lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I " -H
> > 192.168.0.1 -P 33708 -n 0 -o 0 ""
> > hboot: process schema = "/home/slice4e/cluster/LAM/etc/lam-conf.lam"
> > hboot: found /home/slice4e/cluster/LAM/bin/lamd
> > hboot: performing tkill
> > hboot: tkill
> > hboot: booting...
> > hboot: fork /home/slice4e/cluster/LAM/bin/lamd
> > hboot: attempting to execute
> > [1] 5025 lamd -H 192.168.0.1 -P 33708 -n 0 -o 0 -d
> > Executing hboot on n1 (node01 - 1 CPU)...
> > lamboot: attempting to execute "ssh -x node01 -n echo $SHELL"
> > lamboot: got remote shell /bin/bash
> > lamboot: attempting to execute "ssh -x node01 -n hboot -t -c
> > lam-conf.lam -d -v
> > -s -I "-H 192.168.0.1 -P 33708 -n 1 -o 0 ""
> > hboot: process schema = "/home/slice4e/cluster/LAM/etc/lam-conf.lam"
> > hboot: found /home/slice4e/cluster/LAM/bin/lamd
> > hboot: performing tkill
> > hboot: tkill
> > hboot: booting...
> > hboot: fork /home/slice4e/cluster/LAM/bin/lamd
> > [1] 12491 lamd -H 192.168.0.1 -P 33708 -n 1 -o 0 -d
> > topology n1...
>
> --------------------------------------------------------------------------
---
> > lamboot encountered some error (see above) during the boot process,
> > and will now attempt to kill all nodes that it was previously able to
> > boot (if any).
> >
> > Please wait for LAM to finish; if you interrupt this process, you may
> > have LAM daemons still running on remote nodes.
>
> --------------------------------------------------------------------------
---
> > wipe ...
> >
> > LAM 6.5.9/MPI 2 C++/ROMIO - Indiana University
> >
> > Executing tkill on n0 (cserver)...
> > Executing tkill on n1 (node01)...
> > lamboot did NOT complete successfully
> > [slice4e_at_cserver slice4e]$
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|