LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Vinicius de Lima (vinicius_at_[hidden])
Date: 2005-03-30 12:09:08


Hi,

Did it find the problem?

Vinicius.

Jeff Squyres wrote:

> Greetings.
>
> Can you try the suggestions listed in the help message and send back
> the results?
>
> Thanks!
>
>
> On Mar 29, 2005, at 2:42 PM, Vinicius de Lima wrote:
>
>> Hi,
>>
>> I'm with this problem (help me!!!):
>>
>> [swingle_at_swingle /]$ ssh -x swingle3 -n 'echo $SHELL'
>> /bin/bash
>> [swingle_at_swingle /]$ lamboot -v
>>
>> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>>
>> n-1<6733> ssi:boot:base:linear: booting n0 (swingle)
>> n-1<6733> ssi:boot:base:linear: booting n1 (swingle3)
>> -----------------------------------------------------------------------
>> ------
>> The lamboot agent timed out while waiting for the newly-booted process
>> to call back and indicated that it had successfully booted.
>>
>> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
>> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
>> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
>> *** MAILING LIST.
>>
>> As far as LAM could tell, the remote process started properly, but
>> then never called back. Possible reasons that this may happen:
>>
>> - There are network filters between the lamboot agent host and
>> the remote host such that communication on random TCP ports
>> is blocked
>> - Network routing from the remote host to the local host isn't
>> properly configured (this is uncommon)
>>
>> You can check these things by watching the output from "lamboot -d".
>>
>> 1. On the command line for hboot, there are two important parameters:
>> one is the IP address of where the lamboot agent was invoked, the
>> other is the port number that the lamboot agent is expecting the
>> newly-booted process to call back on (this will be a random
>> integer).
>>
>> 2. Manually login to the remote machine and try to telnet to the port
>> indicated on the hboot command line. For example,
>> telnet <ipnumber> <portnumber>
>> If all goes well, you should get a "Connection refused" error. If
>> you get any other kind of error, it could indicate either of the
>> two conditions above. Consult with your system/network
>> administrator.
>> -----------------------------------------------------------------------
>> ------
>> n-1<6733> ssi:boot:base:linear: aborted!
>> n-1<6739> ssi:boot:base:linear: booting n0 (swingle)
>> n-1<6739> ssi:boot:base:linear: booting n1 (swingle3)
>> n-1<6739> ssi:boot:base:linear: finished
>> lamboot did NOT complete successfully
>>
>>
>> Tks,
>> Vinicius.
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>