Hello,
It is unfortunate that recent Linux distributions have been putting your
hostname on the same line as the "127.0.0.1 localhost" line in /etc/hosts.
The 127.0.0.1 is a special value, and shouldn't (in most circomstances)
have anything but variations of localhost and localhost.localdomain
associated with it. A real hostname, to be useful in a network, i.e.
with more than one machine, needs to have an externally useful IP
address. 127.0.0.1 will never leave the box it starts from.
The issue with LAM is that lamboot remotely executes a command (hboot) on
each node in your cluster, and on the commandline it sends the IP address
(or hostname if you use the -l option) of the node you are starting LAM
from. hboot uses that address to call home and join the LAM environment.
If it resolves to 127.0.0.1 it tries to talk to itself, the localhost, and
fails.
So, in short, edit all your /etc/hosts files to look identical,
something like this:
127.0.0.1 localhost localhost.localdomain
192.168.1.1 lilian1
192.168.1.2 lilian2
192.168.1.3 lilian3
192.168.1.4 lilian4
--
Tim Mattox - tmattox_at_[hidden] - http://homepage.mac.com/tmattox/
http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/
On Thu, 18 Sep 2003, Nelson Brito wrote:
> > Hi,
> > maybe I'm too stupid to get it, but it doesn't work here. I changed
>
> No, perhaps you're just much more smarter than me...
>
> > Just speculating:
> > The line
> >
> > [1] 21162 lamd -H 127.0.0.1 -P 13813 -n 1 -o 0 -d
> >
> > says, that lamd is executed on lilian2 with a given home node 127.0.0.1,
> > right? If lamd on lilian2 tries to contact some process via 127.0.0.1, it
> > lands on lilian2, not on the home node, which is lilian1 (I tried it from
> > lilian1 for change). Hence lamboot fails.
>
> yes you're rigth, but if lamd tries to contact via 192.168.1.2 it will end
> on lilian2 as well.
> I'm not that sure on how the deamon works but i think that it just waits for
> other processess to call him, then all the message passing is explicit on
> the code (i don't know if the deamon controls the requests to give an answer).
>
> well i'm not sure about this, and i don't have lam to test at the moment.
> let's hope that some one else can give us a help :-(
>
> nelson
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|