LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-09-01 21:52:00


On Sep 1, 2005, at 3:48 PM, Keith Stevens wrote:

> I am running LAM/MPI precompiled with the Intel C Compiler. This is
> running on a Rocks 3.3 Linux Cluster. Below are outputs from the lam
> commands:

I don't know any of the details of the Rock bundling of LAM/MPI, but
I'll take a shot...

> Output from recon:
> -------------------------
>
> /opt/lam/intel/bin/recon -d
> [snipped]
> n-1<11238> ssi:boot:base:linear: booting n0 (localhost)
> n-1<11238> ssi:boot:rsh: starting recon on (localhost)
> n-1<11238> ssi:boot:rsh: starting on n0 (localhost): tkill -N -d
> n-1<11238> ssi:boot:rsh: launching locally
> n-1<11238> ssi:boot:base:linear: Failed to boot n0 (localhost)
> n-1<11238> ssi:boot:base:linear: aborted!

I'm guessing that /opt/lam/intel/bin is not in your PATH. As the help
message states, LAM needs to have its executables in your PATH to
function properly -- can you check and see if it is?

> -----------------------------------------------------------------------
> ------
> recon was not able to complete successfully.  There can be any number
> [snipped]
>         - The LAM executables must be locatable on each machine, using
>           the shell's search path and possibly the LAMHOME environment
>           variable. 
> -----------------------------------------------------------------------
> ------

> LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
>
> n-1<11251> ssi:boot:base:linear: booting n0 (compute-0-1)
> n-1<11251> ssi:boot:base:linear: Failed to boot n0 (compute-0-1)
> n-1<11251> ssi:boot:base:linear: aborted!
> lamboot did NOT complete successfully

If recon fails, lamboot is almost guaranteed to fail in the same way.

>  additional lamboot recon errors: (I think that this one has helpful
> info)
>  
> /opt/lam/intel/bin/recon -d /opt/lam/intel/bin/lamboot

If you provide a command line argument to recon, it thinks that it is a
text hostfile. So it tried to read the lamboot executable as a text
file and got confused (hence the binary output in the error message).

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/