You may want to read the LAM documentation and/or FAQ; it details the
prerequisites for lamboot to succeed. :-)
You now have a PATH problem -- the LAM command "hboot" is not being
found on the remote node. You need to ensure that your PATH is set
properly in your shell startup files so that the LAM executables can be
found on every node that you're trying to lamboot.
Check out the LAM/MPI User Guide and/or the FAQ under the section
"Booting LAM" for more details.
On Oct 21, 2004, at 10:03 AM, "" <wdz03_at_[hidden]> wrote:
> I did as you said to me.
> But I get the output:
>
> ---------------------------------------
> LAM 7.0.3/MPI 2 C++/ROMIO - Indiana University
>
> n0<15170> ssi:boot:base:linear: booting n0 (mpi201)
> n0<15170> ssi:boot:base:linear: booting n1 (mpi202)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> bash: line 1: hboot: command not found
> -----------------------------------------------------------------------
> ------
> LAM attempted to execute a process on the remote node "mpi202",
> but received some output on the standard error.
>
> LAM tried to use the remote agent command "/usr/bin/rsh"
> to invoke "hboot" on the remote node.
>
> This can indicate an authentication error with the remote agent, or
> can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> $HOME/.profile files. The following is a list of items that you may
> wish to check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> /usr/bin/rsh mpi202 -n hboot -t -c lam-conf.lamd -v -s -I "-H
> 168.0.0.201
> -P 32805 -n 1 -o 0"
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------
> ------
> n0<15170> ssi:boot:base:linear: Failed to boot n1 (mpi202)
> n0<15170> ssi:boot:base:linear: aborted!
> -----------------------------------------------------------------------
> ------
> lamboot encountered some error (see above) during the boot process,
> and will now attempt to kill all nodes that it was previously able to
> boot (if any).
>
> Please wait for LAM to finish; if you interrupt this process, you may
> have LAM daemons still running on remote nodes.
> -----------------------------------------------------------------------
> ------
> n0<15176> ssi:boot:base:linear: booting n0 (mpi201)
> n0<15176> ssi:boot:base:linear: booting n1 (mpi202)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> bash: line 1: tkill: command not found
> -----------------------------------------------------------------------
> ------
> LAM attempted to execute a process on the remote node "mpi202",
> but received some output on the standard error.
>
> LAM tried to use the remote agent command "/usr/bin/rsh"
> to invoke "tkill" on the remote node.
>
> This can indicate an authentication error with the remote agent, or
> can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> $HOME/.profile files. The following is a list of items that you may
> wish to check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> /usr/bin/rsh mpi202 -n tkill -v
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------
> ------
> n0<15176> ssi:boot:base:linear: Failed to boot n1 (mpi202)
> n0<15176> ssi:boot:base:linear: aborted!
> lamboot did NOT complete successfully
>
> _____________________________________________________
>
>
>> From: Jeff Squyres <jsquyres_at_[hidden]>
>> Reply-To: General LAM/MPI mailing list <lam_at_[hidden]>
>> To: General LAM/MPI mailing list <lam_at_[hidden]>
>> Subject: Re: LAM: kerberos blocks rsh
>>
>> Are you exporting LAMRSH to the environment? Note that it's not
>> sufficient to (assuming bash):
>>
>> $ LAMRSH=rsh
>> $ lamboot ...
>>
>> You need to export LAMRSH so that lamboot will see it:
>>
>> $ LAMRSH=rsh
>> $ export LAMRSH
>> $ lamboot ...
>>
>>
>> On Oct 21, 2004, at 8:58 AM, "" <wdz03_at_[hidden]> wrote:
>>
>>> http://lam.squyres.com/MailArchives/lam/msg02364.php
>>> "default remote shell uses kerberos leading to lamboot faliure"
>>>
>>> -----------------------------------
>>> Hi,
>>> I am facing yet another problem. While compilig for lam I did NOT
>>> select
>>> any rsh so my rsh was automatically selected to be:
>>> /usr/kerberos/bin/rsh
>>> (On RHL-6.2 P-3)
>>> However in absence of kerberos authentication mechanism commmands
>>> like
>>> /usr/kerberos/bin/rsh arjun2.aero.iitb.ernet.in -n echo $SHELL
>>> yeild following output :
>>> arjun2.aero.iitb.ernet.in: Connection refused
>>> Trying krb4 rsh...
>>> arjun2.aero.iitb.ernet.in: Connection refused
>>> trying normal rsh (/usr/bin/rsh)
>>> /bin/sh
>>> Because the first four lines are printed on stderr lamboot refuses
>>> to
>>> boot and recon also fails.
>>> I tried setting LAMRSH=/usr/bin/rsh and then excute lamboot but Lam
>>> still tried to use the remote agent command "/usr/kerberos/bin/rsh"
>>> to invoke "echo $SHELL" on the remote node.
>>> I have a dirty trick in mind i.e. to
>>> cp /usr/bin/rsh /usr/kerberos/bin/rsh (Which I hope
>>> should
>>> work)
>>> But before I do anything of that sort I would like to ask why
>>> setting
>>> LAMRSH did not give me any reprive? Also if any elegant solution
>>> exists
>>> please let me know.
>>>
>>> Thanking you in advance.
>>>
>>>
>>> *********************************************************************
>>> **
>>> ********
>>>
>>> snail mail addresses:-
>>>
>>> Kaustuv, # Kaustuv,
>>> H-6, #255; # C/o Dr. N. R. Prasad,
>>> IIT BOMBAY; # East Patel Nagar, Rd. No.1;
>>> POWAI -400076 # Patna -800023;
>>> MAHARASTRA. {ph: 91-22-7460551}# Bihar . {ph:91-612-281274}
>>> *********************************************************************
>>> **
>>> ********
>>>
>>> --------------------------------------------------------------------
>>> There are no answers to this question
>>>
>>> I have met the same question. How to solve this problem? Thank you!
>>>
>>> regards
>>> Caregx
>>>
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>
>> --
>> {+} Jeff Squyres
>> {+} jsquyres_at_[hidden]
>> {+} http://www.lam-mpi.org/
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|