LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: bcruchet_at_[hidden]
Date: 2006-12-09 09:32:43


HI!!

   view your /etc/hosts ( GNU/Linux ?? ) and add this:

   10.101.11.45 cpu1
   10.101.11.58 cpu2

   sometimes the system made a DNS query, this take many time on some
systems.

:)

> Hi,
>
> I was trying to use lamboot command using 2 cpus. I made a
> hostfile on 10.101.11.45 like this:
>
> 10.101.11.45 user=manojv
> 10.101.11.58 user=manoj
>
> When I use $ lamboot hostfile, it takes too much of time and gives
> error(pasted below). I am using secured connection using ssh keys. I am
> able to connect 10.101.11.58 without any password or from 10.101.11.58, I
> am able to connect 10.101.11.45.
>
> When I use the same command with same hostfile on 10.101.11.58, it's done
> without any problem.
>
> I have made sure that on both of the machines, there is same version of
> LAM(7.1.1).
>
> Can anybody have idea why I am not able to lamboot from 10.101.11.45 ???
>
>
> the error it gives is pasted below for the reference.
> thanks.
>
> error::
> -------------------------------------------------------------
> manojv_at_10.101.11.45 $ lamboot ~/host
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> eros: Connection refused
> -----------------------------------------------------------------------------
> LAM failed to execute a process on the remote node "manoj_at_10.101.11.58".
> LAM was not trying to invoke any LAM-specific commands yet -- we were
> simply trying to determine what shell was being used on the remote
> host.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This usually indicates an authentication problem with the remote
> agent, some other configuration type of error in your .cshrc or
> .profile file, or you were unable to executable a command on the
> remote node for some other reason. The following is a list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.101.11.58 -n -l manoj 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
>
>
> --
> manoj vaghela
> zeus numerix pvt ltd
> aerospace engineering department
> indian institute of technology bombay
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>