On Feb 13, 2005, at 1:05 PM, Aditya Datey wrote:
>> - Did you confirm that LAM was getting the right IP address for coral?
> Yes , in a way that none of the messages show something like 127.0.0.01
> which i found was a common error on the list archives. All the messages
> show the correct ip for that machine.
Ok, good.
>> - You might want to check with local system administrators to see if
>> any firewalls are in place between the machines (e.g., at the router
>> or
>> switch level)
> I checked with the friendly neighbourhood sysadmin, and got it that
> there was nothing that would prevent opening of ssh on random sockets.
Be careful not to mix your metaphors here (so to speak). ssh and
random sockets (at least in a LAM context) are two different things.
LAM can use ssh, but it will only use ssh on whatever your default
ports are (unless you specify a -p argument in $LAMRSH, for example).
LAM does require random sockets to be able to be used between all
nodes, but that is unrelated to rsh or ssh.
> This is verified by the fact that I can boot LAM successfully on 4 of
> the 10 machines Im trying to get it working on.
Ok. So this machine that you're having a problem with is a 5th machine
that you'd like to add to the mix?
> Now the 4 working machines are a heterogenous mix, kernel and RH
> version
> wise. But all run LAM 7.0.6. None of the machines are older than RH8,
> and most of them have the 2.4.22 linux kernel.
> Now when I compiled the kernels for the machines, it is possible that I
> selected different things (to get the sound card working etc.) on the
> machines.
>
> ** So one reason I can think of for why it is working on some but not
> all machines, is that LAM needs something in the kernel that I did not
> put in ??
No, LAM does very little in the kernel -- just system calls through
libc, etc. (unless you're using Myrinet or Infiniband, but even then,
we're using the user-level access libraries -- LAM has no
kernel-specific code). So if your kernels are slightly different, LAM
won't care.
The most important aspect is to have the same version of LAM natively
compiled on each machine. Check out the heterogeneity questions in the
FAQ for more details.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|