LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-07-08 14:45:13


On Tue, 8 Jul 2003, Gurgul, Dennis J. wrote:

> I have a 5 node cluster with OSCAR 2.2 and Lam 6.5.9. All 4 internal
> nodes are identical. But, while 2 of them will work, the other two will
> [snipped]
> The last line in the output before the error message (lamboot
> encountered some error.....) is:
>
> topology n3...
>
> The error message says "(see above", however, there is nothing to
> indicate what went wrong.

That's fairly odd. :-)

It seems like this might be a networking problem -- the "topology" phase
is where LAM is sending around the connection information.

- Did the firewall software somehow get enabled on any of the nodes?
  In OSCAR, pfilter *should* be configured to allow connections from any
  port to any port within the cluster.
- Can you ssh between all the nodes properly (without a password)?

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/