riginal message -
HI All, I have problem to tried lamboot and recon. The rsh test is
ok, but lamboot and recon were not work. The message is follow,
please tell me how to boot lam. thanks a lot. Best Regards Wei-Zhao
Lu rsh message [wzlu_at_in04033 lam_test]$ rsh 10.0.4.34-n 'echo
$SHELL' connect to address 10.0.4.34: Connection refused Trying krb4
rsh... connect to address 10.0.4.34: Connection refused trying normal
rsh (/usr/bin/rsh) /bin/bash lamboot message [wzlu_at_in04033 lam_test]$
lamboot -v host LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
n-1<4940> ssi:boot:base:linear: booting n0 (10.0.4.33) n-1<4940>
ssi:boot:base:linear: booting n1 (10.0.4.34) ERROR: LAM/MPI
unexpectedly received the following on stderr: connect to address
10.0.4.34: Connection refused connect to address 10.0.4.34: Connection
refused trying normal rsh (/usr/bin/rsh)
--------------------------------------------------
--------------------------- LAM attempted to execute a process on the
remote node "10.0.4.34", but received some output on the standard
error. This heuristic assumes that any output on the standard error
indicates a fatal error, and therefore aborts. You can disable this
behavior (i.e., have LAM ignore output on standard error) in the rsh
boot module by setting the SSI parameter boot_rsh_ignore_stderr to 1.
LAM tried to use the remote agent command "rsh" to invoke "echo
$SHELL" on the remote node. *** PLEASE READ THIS ENTIRE MESSAGE,
FOLLOW ITS SUGGESTIONS, AND *** CONSULT THE "BOOTING LAM" SECTION OF
THE LAM/MPI FAQ *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO
THE LAM/MPI USER'S *** MAILING LIST. This can indicate an
authentication error with the remote agent, or can indicate an error
in your $HOME/.cshrc, $HOME/.login, or $HOME/.profile files. The
following is a (non-inclusive) list of items that you should check on
the remote node: - You have an account and can login to the remote
machine - Incorrect permissions on your home directory (should
probably be 0755) - Incorrect permissions on your $HOME/.rhosts file
(if you are using rsh -- they should probably be 0644) - You have an
entry in the remote $HOME/.rhosts file (if you are using rsh) for the
machine and username that you are running from - Your .cshrc/.profile
must not print anything out to the standard error - Your
.cshrc/.profile should set a correct TERM type - Your .cshrc/.profile
should set the SHELL environment variable to your default shell Try
invoking the following command at the unix command line: rsh
10.0.4.34-n 'echo $SHELL' You will need to configure your local setup
such that you will *not* be prompted for a password to invoke this
command on the remote node. No output should be printed from the
remote node before the output of the command is displayed. When you
can get this command to execute successfully by hand, LAM will
probably be able to function properly.
--------------------------------------------------
--------------------------- n-1<4940> ssi:boot:base:linear: Failed to
boot n1 ( 10.0.4.34) n-1<4940> ssi:boot:base:linear: aborted!
n-1<4945> ssi:boot:base:linear: booting n0 (10.0.4.33) n-1<4945>
ssi:boot:base:linear: booting n1 (10.0.4.34) ERROR: LAM/MPI
unexpectedly received the following on stderr: connect to address
10.0.4.34: Connection refused connect to address 10.0.4.34: Connection
refused trying normal rsh (/usr/bin/rsh)
--------------------------------------------------
--------------------------- LAM attempted to execute a process on the
remote node "10.0.4.34", but received some output on the standard
error. This heuristic assumes that any output on the standard error
indicates a fatal error, and therefore aborts. You can disable this
behavior (i.e., have LAM ignore output on standard error) in the rsh
boot module by setting the SSI parameter boot_rsh_ignore_stderr to 1.
LAM tried to use the remote agent command "rsh" to invoke "echo
$SHELL" on the remote node. *** PLEASE READ THIS ENTIRE MESSAGE,
FOLLOW ITS SUGGESTIONS, AND *** CONSULT THE "BOOTING LAM" SECTION OF
THE LAM/MPI FAQ *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO
THE LAM/MPI USER'S *** MAILING LIST. This can indicate an
authentication error with the remote agent, or can indicate an error
in your $HOME/.cshrc, $HOME/.login, or $HOME/.profile files. The
following is a (non-inclusive) list of items that you should check on
the remote node: - You have an account and can login to the remote
machine - Incorrect permissions on your home directory (should
probably be 0755) - Incorrect permissions on your $HOME/.rhosts file
(if you are using rsh -- they should probably be 0644) - You have an
entry in the remote $HOME/.rhosts file (if you are using rsh) for the
machine and username that you are running from - Your .cshrc/.profile
must not print anything out to the standard error - Your
.cshrc/.profile should set a correct TERM type - Your .cshrc/.profile
should set the SHELL environment variable to your default shell Try
invoking the following command at the unix command line: rsh
10.0.4.34-n 'echo $SHELL' You will need to configure your local setup
such that you will *not* be prompted for a password to invoke this
command on the remote node. No output should be printed from the
remote node before the output of the command is displayed. When you
can get this command to execute successfully by hand, LAM will
probably be able to function properly.
--------------------------------------------------
--------------------------- n-1<4945> ssi:boot:base:linear: Failed to
boot n1 ( 10.0.4.34) n-1<4945> ssi:boot:base:linear: aborted! lamboot
did NOT complete successfully recon message: [wzlu_at_in04033 lam_test]$
recon -v host n-1<4949> ssi:boot:base:linear: booting n0 (10.0.4.33)
n-1<4949> ssi:boot:base:linear: booting n1 (10.0.4.34) ERROR: LAM/MPI
unexpectedly received the following on stderr: connect to address
10.0.4.34: Connection refused connect to address 10.0.4.34: Connection
refused trying normal rsh (/usr/bin/rsh)
--------------------------------------------------
--------------------------- LAM attempted to execute a process on the
remote node "10.0.4.34", but received some output on the standard
error. This heuristic assumes that any output on the standard error
indicates a fatal error, and therefore aborts. You can disable this
behavior (i.e., have LAM ignore output on standard error) in the rsh
boot module by setting the SSI parameter boot_rsh_ignore_stderr to 1.
LAM tried to use the remote agent command "rsh" to invoke "echo
$SHELL" on the remote node. *** PLEASE READ THIS ENTIRE MESSAGE,
FOLLOW ITS SUGGESTIONS, AND *** CONSULT THE "BOOTING LAM" SECTION OF
THE LAM/MPI FAQ *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO
THE LAM/MPI USER'S *** MAILING LIST. This can indicate an
authentication error with the remote agent, or can indicate an error
in your $HOME/.cshrc, $HOME/.login, or $HOME/.profile files. The
following is a (non-inclusive) list of items that you should check on
the remote node: - You have an account and can login to the remote
machine - Incorrect permissions on your home directory (should
probably be 0755) - Incorrect permissions on your $HOME/.rhosts file
(if you are using rsh -- they should probably be 0644) - You have an
entry in the remote $HOME/.rhosts file (if you are using rsh) for the
machine and username that you are running from - Your .cshrc/.profile
must not print anything out to the standard error - Your
.cshrc/.profile should set a correct TERM type - Your .cshrc/.profile
should set the SHELL environment variable to your default shell Try
invoking the following command at the unix command line: rsh
10.0.4.34-n 'echo $SHELL' You will need to configure your local setup
such that you will *not* be prompted for a password to invoke this
command on the remote node. No output should be printed from the
remote node before the output of the command is displayed. When you
can get this command to execute successfully by hand, LAM will
probably be able to function properly.
--------------------------------------------------
--------------------------- n-1<4949> ssi:boot:base:linear: Failed to
boot n1 ( 10.0.4.34) n-1<4949> ssi:boot:base:linear:
aborted!____________________________________ ___________ This list is
archived at http://www.lam-mpi.org/MailArchives/lam/
On 6/20/08, wzlu <wzlu_at_[hidden]> wrote:
> HI All,
>
> I have problem to tried lamboot and recon.
> The rsh test is ok, but lamboot and recon were not work.
> The message is follow, please tell me how to boot lam. thanks a lot.
>
> Best Regards
> Wei-Zhao Lu
>
> rsh message
> [wzlu_at_in04033 lam_test]$ rsh 10.0.4.34 -n 'echo $SHELL'
> connect to address 10.0.4.34: Connection refused
> Trying krb4 rsh...
> connect to address 10.0.4.34: Connection refused
> trying normal rsh (/usr/bin/rsh)
> /bin/bash
>
> lamboot message
> [wzlu_at_in04033 lam_test]$ lamboot -v host
>
> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>
> n-1<4940> ssi:boot:base:linear: booting n0 (10.0.4.33)
> n-1<4940> ssi:boot:base:linear: booting n1 (10.0.4.34)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> connect to address 10.0.4.34: Connection refused
> connect to address 10.0.4.34: Connection refused
> trying normal rsh (/usr/bin/rsh)
> -----------------------------------------------------------------------------
> LAM attempted to execute a process on the remote node "10.0.4.34",
> but received some output on the standard error. This heuristic
> assumes that any output on the standard error indicates a fatal error,
> and therefore aborts. You can disable this behavior (i.e., have LAM
> ignore output on standard error) in the rsh boot module by setting the
> SSI parameter boot_rsh_ignore_stderr to 1.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This can indicate an authentication error with the remote agent, or
> can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> $HOME/.profile files. The following is a (non-inclusive) list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.0.4.34 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<4940> ssi:boot:base:linear: Failed to boot n1 (10.0.4.34)
> n-1<4940> ssi:boot:base:linear: aborted!
> n-1<4945> ssi:boot:base:linear: booting n0 (10.0.4.33)
> n-1<4945> ssi:boot:base:linear: booting n1 (10.0.4.34)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> connect to address 10.0.4.34: Connection refused
> connect to address 10.0.4.34: Connection refused
> trying normal rsh (/usr/bin/rsh)
> -----------------------------------------------------------------------------
> LAM attempted to execute a process on the remote node "10.0.4.34",
> but received some output on the standard error. This heuristic
> assumes that any output on the standard error indicates a fatal error,
> and therefore aborts. You can disable this behavior (i.e., have LAM
> ignore output on standard error) in the rsh boot module by setting the
> SSI parameter boot_rsh_ignore_stderr to 1.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This can indicate an authentication error with the remote agent, or
> can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> $HOME/.profile files. The following is a (non-inclusive) list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.0.4.34 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<4945> ssi:boot:base:linear: Failed to boot n1 (10.0.4.34)
> n-1<4945> ssi:boot:base:linear: aborted!
> lamboot did NOT complete successfully
>
> recon message:
> [wzlu_at_in04033 lam_test]$ recon -v host
> n-1<4949> ssi:boot:base:linear: booting n0 (10.0.4.33)
> n-1<4949> ssi:boot:base:linear: booting n1 (10.0.4.34)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> connect to address 10.0.4.34: Connection refused
> connect to address 10.0.4.34: Connection refused
> trying normal rsh (/usr/bin/rsh)
> -----------------------------------------------------------------------------
> LAM attempted to execute a process on the remote node "10.0.4.34",
> but received some output on the standard error. This heuristic
> assumes that any output on the standard error indicates a fatal error,
> and therefore aborts. You can disable this behavior (i.e., have LAM
> ignore output on standard error) in the rsh boot module by setting the
> SSI parameter boot_rsh_ignore_stderr to 1.
>
> LAM tried to use the remote agent command "rsh"
> to invoke "echo $SHELL" on the remote node.
>
> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
> *** MAILING LIST.
>
> This can indicate an authentication error with the remote agent, or
> can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> $HOME/.profile files. The following is a (non-inclusive) list of items
> that you should check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> rsh 10.0.4.34 -n 'echo $SHELL'
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
> n-1<4949> ssi:boot:base:linear: Failed to boot n1 (10.0.4.34)
> n-1<4949> ssi:boot:base:linear: aborted!
>
>
|