Great!!! It was firewall problem. Now my lam/mpi is 100% happy. Thank you, very much!!!
Brian Barrett <brbarret_at_[hidden]> wrote: Have you tried the suggestions in the error message (you say you've
tried solutions, but don't say what they are). This is almost always
caused by a software firewall running on one of the nodes you are using.
Brian
On Mar 6, 2008, at 8:13 AM, zayar wrote:
> Dear members,
> I have problem in lamboot. I also found this topic on the
> FAQs page. I have tried possible solutions but still the error. When
> booting lam-mpi on openSUSE 10.3, I got the following error messages:
>
> zayar_at_HPC-3:~>lamboot -v bhost
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<25538> ssi:boot:base:linear: booting n0 (HPC-3)
> n-1<25538> ssi:boot:base:linear: booting n1 (HPC-2)
> -----------------------------------------------------------------------------
> The lamboot agent timed out while waiting for the newly-booted process
> to call back and indicated that it had successfully booted.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> As far as LAM could tell, the remote process started properly, but
> then never called back. Possible reasons that this may happen:
>
> - There are network filters between the lamboot agent host and
> the remote host such that communication on random TCP ports
> is blocked
> - Network routing from the remote host to the local host isn't
> properly configured (this is uncommon)
>
> You can check these things by watching the output from "lamboot -d".
>
> 1. On the command line for hboot, there are two important parameters:
> one is the IP address of where the lamboot agent was invoked, the
> other is the port number that the lamboot agent is expecting the
> newly-booted process to call back on (this will be a random
> integer).
>
> 2. Manually login to the remote machine and try to telnet to the port
> indicated on the hboot command line. For example,
> telnet
> If all goes well, you should get a "Connection refused" error. If
> you get any other kind of error, it could indicate either of the
> two conditions above. Consult with your system/network
> administrator.
> -----------------------------------------------------------------------------
> n-1<25538> ssi:boot:base:linear: aborted!
> n-1<25544> ssi:boot:base:linear: booting n0 (HPC-3)
> n-1<25544> ssi:boot:base:linear: booting n1 (HPC-2)
> n-1<25544> ssi:boot:base:linear: finished
> lamboot did NOT complete successfully
> zayar_at_HPC-3:~> telnet (my-remote-ip) 23451
> Trying (my-remote-ip)...
> telnet: connect to address (my-remote-ip): Connection refused
> zayar_at_HPC-3:~> telnet 127.0.0.1 32154
> Trying 127.0.0.1...
> telnet: connect to address 127.0.0.1: Connection refused
> zayar_at_HPC-3:~> ssh -x hpc-2 hostname
> HPC-2
> zayar_at_HPC-3:~>
> Please advise me.
> Thanks.
>
> Looking for last minute shopping deals? Find them fast with Yahoo!
> Search._______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
---------------------------------
Never miss a thing. Make Yahoo your homepage.
|