LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Vishal Sahay (vsahay_at_[hidden])
Date: 2004-04-06 11:35:28


Can you send across the following:

- The command you invoke for lamboot - how many nodes you are booting on?
It seems you are just booting on the current node with "lamboot -d" w/o
any hostfile. Just wanted to confirm this.

- The complete output of "lamboot -d"

- The value of your path environment variable

-Vishal

On Tue, 6 Apr 2004, I Kozin wrote:

#
# Hello,
#
# here is the problem:
# we've got a 4 processor Intel Itanium2 box and want to
# use LAM (shared memory environment only).
#
# There is already LAM 6.5 installed but it has been created
# using gcc (v2.95) and I can not link a code compiled using
# Intel Fortran 8.0 with the existing LAM (MPI function
# names are not resolved).
#
# This is a known problem according to LAM FAQ
# and the solutions is to rebuild LAM. OK, I downloaded
# LAM 7.04 and compiled it. Now, I don't want to remove
# the old LAM because it might be useful if someone wants
# to use gcc. Instead I decided to install LAM locally
# in my home directory. I appended the PATH variable
# so that the new path to LAM overrides the old one.
# I also pointed LAMHOME to the local dir (just in case).
#
# While I could not see any problems during make and
# install when I run lamboot it returns an error.
# Although laminfo points to the local dir
#
# "lamboot -d" shows
# ...
# hboot: found /usr/bin/lamd
#
# which it should not. ["which lamd" points to my local dir as well]
#
# and after that
#
# hboot: performing tkill
# hboot: tkill
# hboot: booting...
# hboot: fork /usr/bin/lamd
# [1] 25211 lamd -H 127.0.0.1 -P 1324 -n 0 -o 0 -d
# hboot: attempting to execute
# n0<25208> ssi:boot:rsh: successfully launched on n0 (localhost)
# n0<25208> ssi:boot:base:server: expecting connection from finite list
# n0<25208> ssi:boot:base:server: got connection from 127.0.0.1
# n0<25208> ssi:boot:base:server: this connection is expected (n0)
# ----------------------------------------------------------------------------
# -
# The lamboot agent failed to read a message over a socket from the
# newly-booted process. This should not happen (especially since TCP is
# a guaranteed protocol).
#
# Please check your network connectivity and ensure that messages can be
# passed reliably over TCP. Additionally, ensure that the host where
# the newly-booted process was launched is healthy and still available
# on the network.
# ----------------------------------------------------------------------------
# -
# n0<25208> ssi:boot:base:server: failed to connect to remote lamd!
# n0<25208> ssi:boot:base:server: closing server socket
# n0<25208> ssi:boot:base:linear: aborted!
#
# what is going on?
# Your help is greatly appreciated!
#
# Igor
#
# config.log, make.log and make-install.log can be sent on request.
# _______________________________________________
# This list is archived at http://www.lam-mpi.org/MailArchives/lam/
#