On Fri, 19 Sep 2003, Andreas Wilde wrote:
> thank you for your tips. My /etc/hosts and nsswitch.conf files exactly
> look like you suggest. I tracked the problem down to some other point:
> The local IP-numbers are obtained with a call to
>
> ioctl(sock, SIOCGIFCONF, &config)
>
> inside getifaddr(), which is called by find_orig(), which is called by
> recon, lamboot and wipe. The point is, that getifaddr() returns only
> *ONE* IP-Number, which happens to be 127.0.0.1. To my understanding it
> should deliver at least two: Loopback and the address of the nic,
> 192.168.1.1. It fails to do so. If it came up with both of these
> addresses, find_orig() would choose the right one and return it to recon
> etc., which would be happy then.
Excellent detective work. Yes, this appears to finally be the root of the
problem. When you run that ioctl() on just about any POSIX system, you
should get back a list of all the ethernet devices on your node.
> There are two possible explanations for the problem: 1. getifaddr() has
> some weird bug, which makes it work under most, but not all
> circumstances, or 2. I accidentally found a way to misconfigure my box
> so that ioctl(..., SIOCGIFCONF,...) returns only one IP-number.
Which distro and kernel version are you running?
If you've managed to track all this down, you might want to cut-n-paste
LAM's getifaddr() function out into a small, isolated test program and run
it outside of the larger scope of LAM. See if you get the same results
(only one interface returned).
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|