LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-09-07 07:04:09


Did you follow all the suggestions in the help message?

Additionally, if you are just starting out with MPI, I would suggest
that you use Open MPI, not LAM/MPI. LAM/MPI is deprecated -- all the
developers moved to Open MPI several years ago.

On Sep 7, 2009, at 4:33 AM, ankur pachauri wrote:

> dear all,
>
> i need to set a cluster of two nodes for lamboot, i am using fedora 7
> 1. passwordles ssh is set.
> 2. a folder in the home directory is nfs mounted, both on
> server( 10.1.7.136 ip of the server)and client(ip of the client
> 10.1.7.129).
> lam
> 3. lam 7.1.1. is installed on the system.
> 4. a file named lamhosts is created containg the ip's of server and
> client respetively
>
> [lamuser_at_localhost lam]$ cat lamhost
> 10.1.7.136
> 10.1.7.129
>
>
> lamboot is working on indvidual system but on
>
> [lamuser_at_localhost ~]$ lamboot -v lamhost
>
> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> n-1<3088> ssi:boot:base:linear: booting n0 (10.1.7.136)
> n-1<3088> ssi:boot:base:linear: booting n1 (10.1.7.129)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> 10.1.7.129: Connection refused
> -----------------------------------------------------------------------------
> LAM failed to execute a process on the remote node "10.1.7.129".
> LAM was not trying to invoke any LAM-specific commands yet -- we were
> simply trying to determine what shell was being used on the remote
> host.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This usually indicates an authentication problem with the remote
> agent, some other configuration type of error in your .cshrc or
> .profile file, or you were unable to executable a command on the
> remote node for some other reason. The following is a list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.1.7.129 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<3088> ssi:boot:base:linear: Failed to boot n1 (10.1.7.129)
> n-1<3088> ssi:boot:base:linear: aborted!
> n-1<3093> ssi:boot:base:linear: booting n0 (10.1.7.136)
> n-1<3093> ssi:boot:base:linear: booting n1 (10.1.7.129)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> 10.1.7.129: Connection refused
> -----------------------------------------------------------------------------
> LAM failed to execute a process on the remote node "10.1.7.129".
> LAM was not trying to invoke any LAM-specific commands yet -- we were
> simply trying to determine what shell was being used on the remote
> host.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This usually indicates an authentication problem with the remote
> agent, some other configuration type of error in your .cshrc or
> .profile file, or you were unable to executable a command on the
> remote node for some other reason. The following is a list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.1.7.129 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<3093> ssi:boot:base:linear: Failed to boot n1 (10.1.7.129)
> n-1<3093> ssi:boot:base:linear: aborted!
> lamboot did NOT complete successfully
>
>
>
>
> please help
>
> --
> Ankur Pachauri.
> 09927590910
>
> Research Scholar,
> software engineering.
> Department of Mathematics
> Dayalbagh Educational Institute
> Dayalbagh,
> AGRA
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
jsquyres_at_[hidden]