Jeff,
Thank you for a prompt reply. We are using LAM 7.0. It seems working
fine on our system for now. We may upgrade to 7.1.1 later.
We just found the problem. The fault is at our part. The /tmp directory
on the compute nodes didn't have the right permission for users. So
the bind() failed for lamd. Once the permissions changed. lamboot was
successful.
Thanks again for your prompt reply and the info. on the new version of
LAM.
Lily
On Sat, 2005-01-22 at 07:06, Jeff Squyres wrote:
> On Jan 21, 2005, at 5:01 PM, Lily Li wrote:
>
> > We have a small linux cluster running Redhat 9. The front end node
> > can be reached from outside world, but the rest of the nodes on the
> > cluster can only be accessed through the front end node. We use
> > LAM/MPI 7.0 with ssi tcp. Can we use lam with this kind of
> > configuration ?
>
> Absolutely. As long as all nodes that will be used in the LAM universe
> can communicate directly with each other, everything should be fine.
>
> > The lamboot command will fail with the following messages, please
> > notice the error message in Red color, it gives a nonsense IP addres:
>
> It looks like you are using LAM/MPI 7.0; I believe that there was a bug
> in the help message in this version. Any chance that you can upgrade
> to the latest (7.1.1)?
>
> > [snipped]
> > hboot: fork /cm/production/r3.00/ap/local/lam-7/LINUXM/bin/lamd
> > [1] 23750 lamd -x -H 192.168.1.1 -P 54122 -n 1 -o 0 -d
> > n0<27570> ssi:boot:rsh: successfully launched on n1 (liv1)
> > n0<27570> ssi:boot:base:server: expecting connection from finite list
> > n0<27570> ssi:boot:base:server: got connection from 223.213.12.64
> > -----------------------------------------------------------------------
> > ---
> > The lamboot agent timed out while waiting for the newly-booted process
> > to call back and indicated that it had successfully booted.
>
> Have you verified all the conditions listed in the help message (e.g.,
> no network filters between your hosts, etc.)?
|