LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Andreas Wilde (Andreas.Wilde_at_[hidden])
Date: 2003-10-10 01:51:37


Hi Marcelo,
I had the same behaviour on my cluster. The problem is, that localhost is
resolved to the IP-number 127.0.0.1. This IP-number is passed to the other
(non-home) nodes of your cluster. lamboot cannot work with this IP-number
because it is the loopback address, it always points the machine the program
is running on. In short: 127.0.0.1 is not a valid home-node address.
What to do?
Your /etc/lam/lam-bhost.def should not contain 'localhost' or '127.0.0.1'. If
you have 3 machines named 'foo', 'bar' and 'dummy', you should list these
names in /etc/lam/lam-bhost.def . If recon complains about missing hosts or
whatever, something else is wrong. Probably it is missing name-to-address
resolution, e.g. the names listed in /etc/lam/lam-bhost.def are not listed in
/etc/hosts. In my case, it was a miss-configured kernel, which made the
detection of the local network devices impossible.

so long,
andreas

On Friday 10 October 2003 01:20, you wrote:
> Dear All!
>
> I'm a newbie with LAM, just today i donwload it and install!i read that
> to come up the "server" i need to run recon and there are it's working:
> the output:
>
> [marcelo_at_rosh-temp tmp]$ recon
> ---------------------------------------------------------------------------
>-- Woo hoo!
>
> recon has completed successfully. This means that you will most likely
> be able to boot LAM successfully with the "lamboot" command (but this
> is not a guarantee). See the lamboot(1) manual page for more
> information on the lamboot command.
>
> If you have problems booting LAM (with lamboot) even though recon
> worked successfully, enable the "-d" option to lamboot to examine each
> step of lamboot and see what fails. Most situations where recon
> succeeds and lamboot fails have to do with the hboot(1) command (that
> lamboot invokes on each host in the hostfile).
> ---------------------------------------------------------------------------
>--
>
>
> but when lamboot -d didn't working and lamd don't are up.
>
> [marcelo_at_rosh-temp tmp]$ lamboot -d
>
> LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
>
> lamboot: boot schema file: /etc/lam/lam-bhost.def
> lamboot: opening hostfile /etc/lam/lam-bhost.def
> lamboot: found the following hosts:
> lamboot: n0 localhost
> lamboot: resolved hosts:
> lamboot: n0 localhost --> 127.0.0.1
> lamboot: found 1 host node(s)
> lamboot: origin node is 0 (localhost)
> lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -I " -H
> 127.0.0.1 -P 37542 -n 0 -o 0 ""
> hboot: process schema = "/etc/lam/lam-conf.lam"
> hboot: found /usr/bin/lamd
> hboot: performing tkill
> hboot: tkill
> hboot: booting...
> hboot: fork /usr/bin/lamd
> [1] 32735 lamd -H 127.0.0.1 -P 37542 -n 0 -o 0 -d
> hboot: attempting to execute
> ---------------------------------------------------------------------------
>-- lamboot encountered some error (see above) during the boot process, and
> will now attempt to kill all nodes that it was previously able to boot (if
> any).
>
> Please wait for LAM to finish; if you interrupt this process, you may
> have LAM daemons still running on remote nodes.
> ---------------------------------------------------------------------------
>-- wipe ...
>
> LAM 6.5.6/MPI 2 C++/ROMIO - University of Notre Dame
>
> Executing tkill on n0 (localhost)...
> lamboot did NOT complete successfully
>
>
> someone have a anwser?
>
> thank you
>
> -marcelo
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
________________________________________________
Andreas Wilde
Fraunhofer-Institut fuer Integrierte Schaltungen
Aussenstelle Entwurfsautomatisierung
Zeunerstr. 38
D-01069 Dresden
Tel.: 49 (0) 351 4640 852
Fax : 49 (0) 351 4640 703
E-Mail: Andreas.Wilde_at_[hidden]
________________________________________________