LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-07-25 18:39:03


Greetings.

We got your mail yesterday -- sometimes it takes us a little while to
reply; there was no need to re-post. :-)

The error message you cut-n-pasted actually contains a lot of
information and suggestions intended to help users help themselves.
Have you tried them?

It looks like you are running into a common first-time-user problem of
not being able to launch processes on a remote node. Indeed, it looks
like you are trying to use rsh and there is no corresponding rsh daemon
running on 192.168.11.13 (as indicated by the "Connection refused"
message). You might want to look at the "Booting LAM" section of the
LAM/MPI FAQ (on the main web site) and the "Getting Started" section of
the LAM/MPI User's Guide, and/or consult your local system
administrator (perhaps your solution can be solved by switching to
ssh...? It depends on your local setup).

Hope that helps.

On Jul 25, 2004, at 5:57 PM, Imran Ahmed khan wrote:

> Hi,
> I have executed LAM on local system and its working fine.
>
> But when i try to execute LAMBOOT for more then one system, i m
> getting this error:
>
> lamboot -v hostfile
> ------------------------
> LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
>
> n-1<7977> ssi:boot:base:linear: booting n0 (192.168.11.27)
> n-1<7977> ssi:boot:base:linear: booting n1 (192.168.11.13)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> ntws513tt.ssuet.edu.pk: Connection refused
> -----------------------------------------------------------------------
> ------
> LAM failed to execute a process on the remote node "192.168.11.13".
> LAM was not trying to invoke any LAM-specific commands yet -- we were
> simply trying to determine what shell was being used on the remote
> host.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> This usually indicates an authentication problem with the remote
> agent, or some other configuration type of error in your .cshrc or
> .profile file. The following is a list of items that you may wish to
> check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 192.168.11.13 -n echo $SHELL
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------
> ------
> n-1<7977> ssi:boot:base:linear: Failed to boot n1 (192.168.11.13)
> n-1<7977> ssi:boot:base:linear: aborted!

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/