LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Krishnakanth Subramaniam (ksubramz_at_[hidden])
Date: 2003-07-11 17:14:34


some more details :

1.recon worked just fine.
2. lamhosts consisted of 2 nodes
166.82.1.18
166.82.1.11

thanks in advance
krish

On Fri, 11 Jul 2003, Krishnakanth Subramaniam wrote:

> Hello world,
>
>
> The following is a listing of % cat lamboot.log
> which was generated by %lamboot -d lamhosts 2>lamboot.log
>
>
> <snip>
> n0<23065> ssi:boot:rsh: starting lamd on (166.82.1.11)
> n0<23065> ssi:boot:rsh: starting on n1 (166.82.1.11): hboot -t -c lam-
> conf.lamd -d -s -I "-H 166.82.1.18 -P 39345 -n 1 -o 0"
> n0<23065> ssi:boot:rsh: launching remotely
> n0<23065> ssi:boot:rsh: attempting to execute "rsh 166.82.1.11 -n echo $SHELL"
> n0<23065> ssi:boot:rsh: remote shell /bin/bash
> n0<23065> ssi:boot:rsh: attempting to execute "rsh 166.82.1.11 -n hboot -t -c
> lam-conf.lamd -d -s -I "-H 166.82.1.18 -P 39345 -n 1 -o 0""
> n0<23065> ssi:boot:rsh: successfully launched on n1 (166.82.1.11)
> n0<23065> ssi:boot:base:server: expecting connection from finite list
> n0<23065> ssi:boot:base:server: got connection from 166.82.1.11
> n0<23065> ssi:boot:base:server: this connection is expected (n1)
> n0<23065> ssi:boot:base:server: remote lamd is at 166.82.1.11:4224
> n0<23065> ssi:boot:base:server: closing server socket
> n0<23065> ssi:boot:base:server: connecting to lamd at 166.82.1.18:39346
> n0<23065> ssi:boot:base:server: connected
> n0<23065> ssi:boot:base:server: sending number of links (2)
> n0<23065> ssi:boot:base:server: sending info: n0 (166.82.1.18)
> n0<23065> ssi:boot:base:server: sending info: n1 (166.82.1.11)
> n-1<23068> ssi:boot:rsh: finalizing
> n-1<23068> ssi:boot: Closing
> n0<23065> ssi:boot:base:server: finished sending
> n0<23065> ssi:boot:base:server: disconnected from 166.82.1.18:39346
> n0<23065> ssi:boot:base:server: connecting to lamd at 166.82.1.11:33456
> -----------------------------------------------------------------------------
> The lamboot agent failed to open a client socket to the newly-booted
> process at IP address 166.82.1.11, port 33456.
>
> Although the newly-booted process has already communicated
> successfully with the lamboot agent over other TCP sockets, this is
> the first time that the lamboot agent tried to initiate a connection
> to the newly-booted process. As such, this may indicate:
>
> 1. 166.82.1.11 is not the correct IP address for the machine where the
> newly-booted machine was launched
> 2. There are network filters between the lamboot agent host and
> the remote host such that communication on random TCP ports
> is blocked
> 3. Network routing from the the local host to the remote isn't
> properly configured (this is unlikely)
>
> For number 1, check to ensure that 166.82.1.11 is the correct IP address for
> that machine. If it is not, check the host mapping on that machine
> (e.g., /etc/hosts) to ensure that 166.82.1.11 is both reachable and is the by
> the host where the lamboot agent is running, and is the correct host.
>
> For numbers 2 and 4, try to telnet to 166.82.1.11, port 33456. You should get
> a
> "connection refused" error, which will indicate that you successfully
> connected to some machine at that IP address, and no process was
> listening on that port. If you get any other kind of error, check
> with your system/network administrator -- it may indicate network /
> routing issues between the two hosts.
> -----------------------------------------------------------------------------
>
>
> Now the problem is not #1.
> and for no. 2 and 3, i did telnet to the machine and it did give me a
> connection refused thing.
>
> so, lamboot doesn't work. Anyone encountered such error. Please throw light.
>
> TIA
> krish
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>