LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-12-17 07:54:13


On Dec 16, 2004, at 10:19 PM, Yu-Cheng Chou wrote:

> Below is the error message occurred when i run lamboot command.
> Have any idea to fix this booting problem?

You might want to try exactly what it says in the error message. :-)

> $ lamboot -d machines
[snipped]
> The lamboot agent failed to open a client socket to the newly-booted
> process at IP address 169.237.108.59, port 32974.
>
> Although the newly-booted process has already communicated
> successfully with the lamboot agent over other TCP sockets, this is
> the first time that the lamboot agent tried to initiate a connection
> to the newly-booted process. As such, this may indicate:
>
> 1. 169.237.108.59 is not the correct IP address for the machine
> where the
> newly-booted machine was launched
> 2. There are network filters between the lamboot agent host and
> the remote host such that communication on random TCP ports
> is blocked
> 3. Network routing from the the local host to the remote isn't
> properly configured (this is unlikely)
>
> For number 1, check to ensure that 169.237.108.59 is the correct IP
> address for that machine. If it is not, check the host mapping on
> that machine (e.g., /etc/hosts) to ensure that 169.237.108.59 is both
> reachable and is the by the host where the lamboot agent is running,
> and is the correct host.
>
> For numbers 2 and 4, try to telnet to 169.237.108.59, port 32974. You
> should get a "connection refused" error, which will indicate that you
> successfully connected to some machine at that IP address, and no
> process was listening on that port. If you get any other kind of
> error, check with your system/network administrator -- it may indicate
> network / routing issues between the two hosts.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/