LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Aamir Shafi (aamir.shafi_at_[hidden])
Date: 2004-08-11 05:52:43


As the LAM output suggests, this is a $PATH problem. When you do
lamboot, lam uses this command
/usr/bin/ssh -x compute-0-0 -n hboot -t -c lam-conf.lam -v -s -I "-H
10.1.1.1 -P 45650 -n 1 -o 0 "
to start processes on remote nodes, but when it tries this, it cant find
hboot on the remote node. Make sure you have $LAM_HOME/bin in the path.
If you modified, .bash_profile, that wont help, and in order to make it
work, you need to modify the path in .bash_rc

Hope it helps
--Aamir
Yiyang Sun wrote:

> Dear LAM users,
>
> I got the error message below when running lamboot.
> I installed LAM on a NFS directory of a P4 cluster
> (the executables of LAM can be seen on all nodes).
> Also, I'm not prompted for password when "ssh"-ing
> all nodes. Any idea? Thanks a lot.
>
> BTW, I use 6.5.9 because I got error message when
> compiling 7.0.x.
>
> Yiyang
>
>
> LAM 6.5.9/MPI 2 C++ - Indiana University
>
> Executing hboot on n0 (matrix - 1 CPU)...
> Executing hboot on n1 (compute-0-0 - 1 CPU)...
> bash: line 1: hboot: command not found
> -----------------------------------------------------------------------------
>
> LAM failed to execute a LAM binary on the remote node "compute-0-0".
> Since LAM was already able to determine your remote shell as "hboot",
> it is probable that this is not an authentication problem.
>
> LAM tried to use the remote agent command "/usr/bin/ssh"
> to invoke the following command:
>
> /usr/bin/ssh -x compute-0-0 -n hboot -t -c lam-conf.lam -v -s
> -I "-H 10.1.1.1 -P 45650 -n 1 -o 0 "
>
> This can indicate several things. You should check the following:
>
> - The LAM binaries are in your $PATH
> - You can run the LAM binaries
> - The $PATH variable is set properly before your
> .cshrc/.profile exits
>
> Try to invoke the command listed above manually at a Unix prompt.
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
>
> -----------------------------------------------------------------------------
>
> lamboot encountered some error (see above) during the boot process,
> and will now attempt to kill all nodes that it was previously able to
> boot (if any).
>
> Please wait for LAM to finish; if you interrupt this process, you may
> have LAM daemons still running on remote nodes.
> -----------------------------------------------------------------------------
>
> wipe ...
>
> LAM 6.5.9/MPI 2 C++ - Indiana University
>
> Executing tkill on n0 (matrix)...
>
> _________________________________________________________________
> Download MSN Messenger emoticons and display pictures.
> http://ilovemessenger.msn.com/?mkt=en-sg
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>