There are a couple of things to check:
1) Make sure you don't have the environment variable LAM_RSH set, as it
will override the compiled default
2) Make sure you have the same installation of LAM on both nodes.
I'm guessing it'll turn out to be one of those two things.
Brian
On Thu, 4 Jun 2009, Yogesh Aher wrote:
> Thanks again!
> I checked and I found the difference.
> From host, one of the output line is = n-1<28382> ssi:boot:rsh:
> attempting to execute: rsh 100.120.10.41 -n 'echo $SHELL'
> Whereas from client, when I do the lamboot, the same line is = n-1<6900>
> ssi:boot:rsh: attempting to execute: /usr/bin/ssh -x 120.100.10.04 -n
> 'echo $SHELL'
>
> Although, I installed lam-7.1.4, specifying the path to the ssh (option =
> --with-rsh="/usr/bin/ssh -x")
>
> How can I ask host to /usr/bin/ssh and not rsh?
>
>
> On Thu, Jun 4, 2009 at 1:36 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> You should be able to see the details of exactly what lamboot
> is doing if you use the "-d" option on the command line.
>
>
>
> On Jun 4, 2009, at 6:55 AM, Yogesh Aher wrote:
>
> Dear Jeff,
>
> Thank you very much for pointing out my attention
> towards this point. Now when I installed lam again with
> the option pointing out towards the path of ssh
> (/usr/bin/ssh), I got the same error again. I thought
> that both the paths are matching on both machines and
> it should work now, but it isn't! :(
> For both machines now, the path for mpich/mpicc =
> /usr/local/bin AND path for ssh = /usr/bin
>
> Looking forward for the suggestions again for any other
> checks I need to do.
>
> Thanking you,
>
> Sincerely,
> Y.
>
> On Wed, Jun 3, 2009 at 6:06 PM, Jeff Squyres
> <jsquyres_at_[hidden]> wrote:
> I'm a little confused -- you mention that ssh is in
> /usr/bin/ssh, but you configured LAM with
> --with-rsh=/bin/ssh, not --with-rsh=/usr/bin/ssh.
>
> Is there a reason for the difference?
>
> Note that LAM may not be well setup to handle ssh being
> installed in multiple different locations across
> different nodes; I honestly don't remember. :-(
>
> IIRC, you can also set the env variable LAMRSH at
> run-time to change the location of your "rsh" binary
> (e.g., /usr/bin/ssh vs. /bin/ssh).
>
>
>
> On Jun 3, 2009, at 11:31 AM, Yogesh Aher wrote:
>
> I installed both openssh (openssh-5.2p1), ssh
> (ssh-2.4.0) as a user as well as root with
> "prefix=/bin" also. But it's installing in /usr/bin.
>
>
> On Wed, Jun 3, 2009 at 5:22 PM, Jeff Squyres
> <jsquyres_at_[hidden]> wrote:
> It sounds like ssh is not installed on your other node.
>
>
> On Jun 3, 2009, at 11:07 AM, Yogesh Aher wrote:
>
> Dear Brian,
>
> Thanks for your prompt reply.
>
> I gave this command from both (host and client)
> machines, but both give the same message:
>
> -bash: /bin/ssh: No such file or directory
>
> I installed LAM with the option ./configure
> --with-rsh="/bin/ssh -x"
>
> Also, as I'm thinking to use passwordless-ssh, I
> couldn't find these .rhosts and .cshrc/.profile files.
>
> Any suggestions about it?
>
> Cheers,
> Y.
>
>
> On Wed, Jun 3, 2009 at 4:59 PM, Brian W. Barrett
> <brbarret_at_[hidden]> wrote:
> Did you try to follow any of the suggestions in the
> error message you cut-n-paste into your e-mail to the
> list? In particular, does the command:
>
>
> /bin/ssh -x 100.120.10.41 -n 'echo $SHELL'
>
> work properly?
>
>
> Brian
>
>
>
> On Wed, 3 Jun 2009, Yogesh Aher wrote:
>
> Dear LAM-users,
>
> I stuck again with the working of LAM for charm++. I
> installed ssh,
> openssh, libaio and other necessary libraries (as
> suggested in earlier
> archives) again, but still get the following error. If
> anybody came
> across such error, will you please let me know about
> how to resolve it.
> Also, please let me know if there are any permission
> changes need to be
> done?
>
> [sam_at_xyz Linux-i686-MPI]$ lamboot -v
> /home/sam/.nodelist
>
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<962> ssi:boot:base:linear: booting n0
> (100.120.10.04)
> n-1<962> ssi:boot:base:linear: booting n1
> (100.120.10.41)
> -----------------------------------------------------------------------------
> LAM failed to execute a process on the remote node
> "100.120.10.41".
> LAM was not trying to invoke any LAM-specific commands
> yet -- we were
> simply trying to determine what shell was being used on
> the remote
> host.
>
> LAM tried to use the remote agent command "/bin/ssh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS
> SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI
> FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE
> LAM/MPI USER'S
> *** MAILING LIST.
>
> This usually indicates an authentication problem with
> the remote
> agent, some other configuration type of error in your
> .cshrc or
> .profile file, or you were unable to executable a
> command on the
> remote node for some other reason. The following is a
> list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote
> machine
> - Incorrect permissions on your home directory
> (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file
> (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts
> file (if you
> are using rsh) for the machine and username that
> you are
> running from
> - Your .cshrc/.profile must not print anything out
> to the
> standard error
> - Your .cshrc/.profile should set a correct TERM
> type
> - Your .cshrc/.profile should set the SHELL
> environment
> variable to your default shell
>
> Try invoking the following command at the unix command
> line:
>
> /bin/ssh -x 100.120.10.41 -n 'echo $SHELL'
>
> You will need to configure your local setup such that
> you will *not*
> be prompted for a password to invoke this command on
> the remote node.
> No output should be printed from the remote node before
> the output of
> the command is displayed.
>
> When you can get this command to execute successfully
> by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<962> ssi:boot:base:linear: Failed to boot n1
> (100.120.10.41)
> n-1<962> ssi:boot:base:linear: aborted!
> n-1<967> ssi:boot:base:linear: booting n0
> (100.120.10.04)
> n-1<967> ssi:boot:base:linear: booting n1
> (100.120.10.41)
> -----------------------------------------------------------------------------
> .
> .
> .
>
> When you can get this command to execute successfully
> by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<967> ssi:boot:base:linear: Failed to boot n1
> (100.120.10.41)
> n-1<967> ssi:boot:base:linear: aborted!
> lamboot did NOT complete successfully
>
>
> Thanking you in advance!
>
> Sincerely,
> Yogesh
>
>
> --
> Brian Barrett
> LAM/MPI Developer
> Make today a LAM/MPI day!
> _______________________________________________
> This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
>
> _______________________________________________
> This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
>
> _______________________________________________
> This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
>
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
>
>
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|