Change yourPATH so that the native rsh goes before KRB4 one.
HTH
On 6/20/08, wzlu <wzlu_at_[hidden]> wrote:
> HI All,
>
> I have problem to tried lamboot and recon.
> The rsh test is ok, but lamboot and recon were not work.
> The message is follow, please tell me how to boot lam. thanks a lot.
>
> Best Regards
> Wei-Zhao Lu
>
> rsh message
> [wzlu_at_in04033 lam_test]$ rsh 10.0.4.34 -n 'echo $SHELL'
> connect to address 10.0.4.34: Connection refused
> Trying krb4 rsh...
> connect to address 10.0.4.34: Connection refused
> trying normal rsh (/usr/bin/rsh)
> /bin/bash
>
> lamboot message
> [wzlu_at_in04033 lam_test]$ lamboot -v host
>
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<4940> ssi:boot:base:linear: booting n0 (10.0.4.33)
> n-1<4940> ssi:boot:base:linear: booting n1 (10.0.4.34)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> connect to address 10.0.4.34: Connection refused
> connect to address 10.0.4.34: Connection refused
> trying normal rsh (/usr/bin/rsh)
> -----------------------------------------------------------------------------
> LAM attempted to execute a process on the remote node "10.0.4.34",
> but received some output on the standard error. This heuristic
> assumes that any output on the standard error indicates a fatal error,
> and therefore aborts. You can disable this behavior (i.e., have LAM
> ignore output on standard error) in the rsh boot module by setting the
> SSI parameter boot_rsh_ignore_stderr to 1.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This can indicate an authentication error with the remote agent, or
> can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> $HOME/.profile files. The following is a (non-inclusive) list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.0.4.34 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<4940> ssi:boot:base:linear: Failed to boot n1 (10.0.4.34)
> n-1<4940> ssi:boot:base:linear: aborted!
> n-1<4945> ssi:boot:base:linear: booting n0 (10.0.4.33)
> n-1<4945> ssi:boot:base:linear: booting n1 (10.0.4.34)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> connect to address 10.0.4.34: Connection refused
> connect to address 10.0.4.34: Connection refused
> trying normal rsh (/usr/bin/rsh)
> -----------------------------------------------------------------------------
> LAM attempted to execute a process on the remote node "10.0.4.34",
> but received some output on the standard error. This heuristic
> assumes that any output on the standard error indicates a fatal error,
> and therefore aborts. You can disable this behavior (i.e., have LAM
> ignore output on standard error) in the rsh boot module by setting the
> SSI parameter boot_rsh_ignore_stderr to 1.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This can indicate an authentication error with the remote agent, or
> can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> $HOME/.profile files. The following is a (non-inclusive) list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.0.4.34 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<4945> ssi:boot:base:linear: Failed to boot n1 (10.0.4.34)
> n-1<4945> ssi:boot:base:linear: aborted!
> lamboot did NOT complete successfully
>
> recon message:
> [wzlu_at_in04033 lam_test]$ recon -v host
> n-1<4949> ssi:boot:base:linear: booting n0 (10.0.4.33)
> n-1<4949> ssi:boot:base:linear: booting n1 (10.0.4.34)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> connect to address 10.0.4.34: Connection refused
> connect to address 10.0.4.34: Connection refused
> trying normal rsh (/usr/bin/rsh)
> -----------------------------------------------------------------------------
> LAM attempted to execute a process on the remote node "10.0.4.34",
> but received some output on the standard error. This heuristic
> assumes that any output on the standard error indicates a fatal error,
> and therefore aborts. You can disable this behavior (i.e., have LAM
> ignore output on standard error) in the rsh boot module by setting the
> SSI parameter boot_rsh_ignore_stderr to 1.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This can indicate an authentication error with the remote agent, or
> can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> $HOME/.profile files. The following is a (non-inclusive) list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.0.4.34 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<4949> ssi:boot:base:linear: Failed to boot n1 (10.0.4.34)
> n-1<4949> ssi:boot:base:linear: aborted!
>
>
|