I've installed lam-6.5.9-1 .Time being I'm running it on two machines.But it
seems tcp connection from the
the other end is not getting established. recon is running successfully but
lamboot is giving following error.What could be the reason?
(File lamhosts contains two entries:
master
192.168.10.130)
Output of command lamboot is :
lamboot -v lamhosts
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
n-1<4664> ssi:boot:base:linear: booting n0 (master)
n-1<4664> ssi:boot:base:linear: booting n1 (192.168.10.130)
-----------------------------------------------------------------------------
The lamboot agent failed to read a message over a socket from the
newly-booted process. This should not happen (especially since TCP is
a guaranteed protocol).
*** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
*** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
*** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
*** MAILING LIST.
You should probably check the following:
- Network connectivity: Ensure that messages can be passed reliably
over TCP using random ports.
- Environment / PATH settings: Ensure that you are running the same
version of LAM/MPI on all nodes. Sometimes premature disconnects
(and therefore this error message) may be caused if mismatched
versions of LAM are used on different nodes.
- Node health: Ensure that the host where the newly-booted process was
launched is healthy and still available on the network.
-----------------------------------------------------------------------------
n-1<4664> ssi:boot:base:linear: aborted!
n-1<4670> ssi:boot:base:linear: booting n0 (master)
n-1<4670> ssi:boot:base:linear: booting n1 (192.168.10.130)
tkill: killing LAM...
n-1<4670> ssi:boot:base:linear: finished
lamboot did NOT complete successfully
--
Regards
Mahesh
|