LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Aamir Shafi (aamir.shafi_at_[hidden])
Date: 2004-08-05 05:19:32


Hi,

Sorry guyz, it seems to be a PATH problem.
ssh comp00 'which tkill' gives nothing ....and ssh comp00 'echo $PATH'
doesn't show $LAM_HOME/bin in the path. So i hope i can fix this now.

Regards
--Aamir
Aamir Shafi wrote:

> Hi,
>
> The following output looks like saying, 'tkill' cant be found. So
> basically when lam tries to ssh into the compute nodes, it cant find
> 'tkill'. My question is, how to make it find it ? if its about adding
> $LAM_HOME/bin to the PATH, its already there. What am i missing ?
>
> Thanks for any help
> --Aamir
> shafia_at_holly:~/install/lam-7.0.6/examples/ring$ recon -v lamhosts
> n-1<6434> ssi:boot:base:linear: booting n0 (holly.starbug.dsg.port.ac.uk)
> n-1<6434> ssi:boot:base:linear: booting n1
> (comp00.starbug.dsg.port.ac.uk)
> ERROR: LAM/MPI unexpectedly received the following on stderr:
> bash: line 1: tkill: command not found
> -----------------------------------------------------------------------------
>
> LAM failed to execute a LAM binary on the remote node
> "comp00.starbug.dsg.port.ac.uk".
> Since LAM was already able to determine your remote shell as "tkill",
> it is probable that this is not an authentication problem.
>
> LAM tried to use the remote agent command "ssh"
> to invoke the following command:
>
> ssh comp00.starbug.dsg.port.ac.uk -n tkill -N -v
>
> This can indicate several things. You should check the following:
>
> - The LAM binaries are in your $PATH
> - You can run the LAM binaries
> - The $PATH variable is set properly before your
> .cshrc/.profile exits
>
> Try to invoke the command listed above manually at a Unix prompt.
>
> You will need to configure your local setup such that you will *not*
> be prompted for a password to invoke this command on the remote node.
> No output should be printed from the remote node before the output of
> the command is displayed.
>
> When you can get this command to execute successfully by hand, LAM
> will probably be able to function properly.
> -----------------------------------------------------------------------------
>
> n-1<6434> ssi:boot:base:linear: Failed to boot n1
> (comp00.starbug.dsg.port.ac.uk)
> n-1<6434> ssi:boot:base:linear: aborted!
> -----------------------------------------------------------------------------
>
> recon was not able to complete successfully. There can be any number
> of problems that did not allow recon to work properly. You should use
> the "-d" option to recon to get more information about each step that
> recon attempts.
>
> Any error message above may present a more detailed description of the
> actual problem.
>
> Here is general a list of prerequisites that *must* be fulfilled
> before recon can work:
>
> - Each machine in the hostfile must be reachable and operational.
> - You must have an account on each machine.
> - You must be able to rsh(1) to the machine (permissions
> are typically set in the user's $HOME/.rhosts file).
>
> *** Sidenote: If you compiled LAM to use a remote shell program
> other than rsh (with the --with-rsh option to ./configure;
> e.g., ssh), or if you set the LAMRSH environment variable
> to an alternate remote shell program, you need to ensure
> that you can execute programs on remote nodes with no
> password. For example:
>
> unix% ssh -x pinky uptime
> 3:09am up 211 day(s), 23:49, 2 users, load average: 0.01, 0.08,
> 0.10
>
> - The LAM executables must be locatable on each machine, using
> the shell's search path and possibly the LAMHOME environment
> variable.
> - The shell's start-up script must not print anything on standard
> error. You can take advantage of the fact that rsh(1) will
> start the shell non-interactively. The start-up script (such
> as .profile or .cshrc) can exit early in this case, before
> executing many commands relevant only to interactive sessions
> and likely to generate output.
> -----------------------------------------------------------------------------
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>