On Apr 12, 2006, at 12:17 PM, Mahmoud Payami wrote:
> Dear LAM users and developers,
>
> I am a novice in LAM/MPI and trying to install but still failed. I
> have traced the LAM UG and FAQ but all points mentioned in them are
> satisfied pointwise.
> The steps in configuring and making are as follows:
>
> 1- FC=ifort F77=ifort
> 2- export FC F77
> 3- ./configure --with-rsh="/usr/bin/ssh -x"
Some other options here:
You could also configure with:
./configure --with-rsh="/usr/bin/ssh -x" FC=ifort F77=ifort
Instead of using the '--with-rsh' configure option, you could just
export the environment variable LAMRSH in your .bashrc file:
export LAMRSH=ssh -x
Other notes below...
> 4- make
> 5- make install (with root account).
> 6- I have made a file named "hostfile" containing the two lines:
> condmat1.ctpm.aeoi.org cpu=2
> condmat10.ctpm.aeoi.org cpu=2
> 7- The bin directory (/usr/local/bin) has been added in the
> environmental setting in .bashrc
> 8- I can ssh to the remote node without password
>
> Now as I try to boot lam, I receive the following messages. I would
> appreciate any comment.
>
> Best regards,
> Mahmoud Payami
> ----------------------------------------------------------------------
> --------------------------------------------
> [mahmoud_at_condmat1 ~]$ lamboot -v -ssi boot rsh hostfile
>
> LAM 7.0.6/MPI 2 C++/ROMIO - Indiana University
>
> n-1<27159> ssi:boot:base:linear: booting n0 (condmat1.ctpm.aeoi.org)
> n-1<27159> ssi:boot:base:linear: booting n1 (condmat10.ctpm.aeoi.org)
> ----------------------------------------------------------------------
> -------
> LAM failed to execute a process on the remote node
> "condmat10.ctpm.aeoi.org".
> LAM was not trying to invoke any LAM-specific commands yet -- we were
> simply trying to determine what shell was being used on the remote
> host.
>
> LAM tried to use the remote agent command "/home/mahmoud/lam-7.0.6/
> share/ssi/boot/rsh/ssh"
> to invoke "echo $SHELL" on the remote node.
>
> This usually indicates an authentication problem with the remote
> agent, or some other configuration type of error in your .cshrc or
> .profile file. The following is a list of items that you may wish to
> check on the remote node:
>
> - You have an account and can login to the remote machine
> - Incorrect permissions on your home directory (should
> probably be 0755)
> - Incorrect permissions on your $HOME/.rhosts file (if you are
> using rsh -- they should probably be 0644)
> - You have an entry in the remote $HOME/.rhosts file (if you
> are using rsh) for the machine and username that you are
> running from
> - Your .cshrc/.profile must not print anything out to the
> standard error
> - Your .cshrc/.profile should set a correct TERM type
> - Your .cshrc/.profile should set the SHELL environment
> variable to your default shell
>
> Try invoking the following command at the unix command line:
>
> /home/mahmoud/lam-7.0.6/share/ssi/boot/rsh/ssh -x
> condmat10.ctpm.aeoi.org -n echo $SHELL
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> ----------------------------------------------------------------------
> -------
> n-1<27159> ssi:boot:base:linear: Failed to boot n1
> (condmat10.ctpm.aeoi.org)
> n-1<27159> ssi:boot:base:linear: aborted!
I wonder why it is trying to execute:
/home/mahmoud/lam-7.0.6/share/ssi/boot/rsh/ssh
What does the following command return:
$ which ssh
It should return something like:
/usr/bin/ssh
Have you tried following the suggestions from the lamboot error message?
Something like:
$ /home/mahmoud/lam-7.0.6/share/ssi/boot/rsh/ssh -x
condmat10.ctpm.aeoi.org -n echo $SHELL
or better yet
$ /usr/bin/ssh -x condmat10.ctpm.aeoi.org -n echo $SHELL
----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/
|