LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-06-16 08:10:48


On Jun 16, 2005, at 9:04 AM, Swan wrote:

> I have executed the following command which the lamboot executed too,
> and here is its output:
>  
> [vasptest_at_orlon31 vasptest]$ /usr/local/gt321/bin/globus-job-run
> orlon31 -env PATH=`/bin/echo $PATH` /usr/local/lam-7.1.1-org/bin/hboot
> -t -c /usr/local/lam-7.1.1-org/etc/lam-conf.lamd -d -v -I "-H
> 137.189.27.88 -P 47576 -n 0 -o 0" -prefix /usr/local/lam-7.1.1-org
> tkill: setting prefix to (null)
> tkill: setting suffix to (null)
> tkill: got killname back:
> /tmp/lam-vasptest_at_[hidden]/lam-killfile
> tkill: removing socket file ...
> tkill: socket file:
> /tmp/lam-vasptest_at_[hidden]/lam-kernel-socketd
> tkill: removing IO daemon socket file ...
> tkill: IO daemon socket file:
> /tmp/lam-vasptest_at_[hidden]/lam-io-socket
> tkill: f_kill =
> "/tmp/lam-vasptest_at_[hidden]/lam-killfile"
> tkill: killing LAM...
> tkill: killing PID (SIGHUP) 23484 ..
> tkill:  already dead
> tkill: all finished
> hboot: performing tkill
> hboot: /usr/local/lam-7.1.1-org/bin/tkill -d
> hboot: booting...
> hboot: fork /usr/local/lam-7.1.1-org/bin/lamd
> [1]  23702 lamd -H 137.189.27.88 -P 47576 -n 0 -o 0 -d
> ssi_boot_send_lamd_info: sfh_sock_open_clt_inet_stm failed: Connection
> refused

That's a good sign -- it indicates that orlon31 was able to try to open
a socket to 137.189.27.88, but something actively refused the
connection. Can't say from this output as to whether it was orlon31
itself or some intermediary.

The error that you're seeing here is because lamboot is no longer
running and therefore no longer listening on that port. So this error
is to be expected. Double check your connectivity between these
machines ago -- it is imperative any full TCP connectivity is available
between them (i.e., a socket can be opened from any port to any other
port between the machines). You can use a tool like netpipe to verify
this.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/