LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Prabhanjan Kambadur (pkambadu_at_[hidden])
Date: 2004-04-08 14:30:50


>
> Hi,
>
> 1- "passwordless rsh" from Master to Slave and from Slave to Master is done
>
> successfully with only message "last login: Thursday 08, 2004 from Master"
>
> 2- There is no .cshrc/.profile in the usr's Home directory
>
> 3- No /etc/.cshrc and /etc/.login files, instead there are csh.cshrc and
>
> csh.login files in /etc directory
>
> 4- I think that it doesn't find the " hboot " to execute,i include path "/usr/local/bin"
>
> in /etc/profile, file but it didnot work.
>
Correct

> 5- How can i come to know that .cshrc/.profile will not print anything to standard error?
>

If you simply login to the machine, whatever is printed on login is what
you will always get printed. Most of the stuff is on stdout. But you can
specifically check for error messages when you login by simply eyeballing
the messages printed out. It will be printed out with "ERROR" prefixed.
Sometimes this might happen because your ".profile" or ".cshrc" (depending
on which shell you are using, it might be something else) has some line
which is spewing out the error.

> 6- .cshrc/.profile has set a correct term type?
>

Your term type should not matter much to the execution on remote machines
since we do not use it. You will notice that output on MPI programs when
run with LAM is always done on the same terminal as was used for
"mpirun".

> 7- .cshrc/.profile has set the SHELL environment variable to the default shell?
>
> ****************************************************
>
> ERROR DISPLAYED
>
> ****************************************************
>
> [ahmed_at_Master ahmed]$ lamboot -v -ssi boot rsh /home/ahmed/lamhost
>
> LAM 7.0.3/MPI 2 C++/ROMIO - Indiana University
>
> n0<3345> ssi:boot:base:linear: booting n0 (Slave)
>
> ERROR: LAM/MPI unexpectedly received the following on stderr:
>
> bash: line 1: hboot: command not found
>

As you said this clearly shows that the right PATH is not set. The
executable "hboot" is not being found. Make sure you have all the LAM
executables in your PATH enviroment variable on all the machines which you
are using. A simple test would be to actually login into your machines on
the command line and check for what $PATH is. Check on how to set the PATH
right on whichever shell you are using. On an ssh"ing" machine, this would
be. I am sure rsh has something similar (Sorry, but I can only guide you
to the man pages here :-))

# ssh <machine-name> -n 'echo $PATH'
OR
# ssh <machine-name> -n 'echo `which hboot`'