LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-05-27 09:01:50


On May 27, 2005, at 9:29 AM, Beth Kirschner wrote:

> I'm having trouble getting 'lamboot' to execute from within a PBS
> script on a Mac OSX box. It runs fine without PBS. Has anyone else had
> success with this?

I *think* that we have tested this (PBS/Torque on OSX), but I can't
swear to it. Hypothetically, it *should* be the same as it is on Linux
-- there really shouldn't be any difficulties with this. If there are,
it's a bug that we should fix.

> I've tried building Lam 7.1.1 in two configurations:
>
> # configure --prefix=/usr/local/lam-7.1.1 -with-rsh="ssh -x"
> --without-fc
> # configure --prefix=/usr/local/lam-7.1.1 -with-rsh="ssh -x"
> --without-fc --with-boot=tm --with-boot-tm=/usr/local/pbs

Can you send the output of the latter? I'd like to see the full output
of the configure including the --with-boot... switches (please
compress). Also send the corresponding config.log file, and
share/ssi/boot/tm/config.log.

You can also check to ensure that the TM support built properly by
running the laminfo command. It will show you all the modules that
were built into LAM. If the "tm" boot module is not listed, then the
PBS/Torque support did not build properly.

If it did not build properly, the output from configure should shed
light on the reason why (the determination of whether to build a given
module or not is made during configure).

> Here's the script I've been running:
>
> #PBS -l nodes=1:ppn=2
> /usr/local/lam-7.1.1/bin/lamboot -d -v ${PBS_NODEFILE}
>
> Here's some of the output:
>
> n-1<7964> ssi:boot:base:server: opened port 55040
> n-1<7964> ssi:boot:base:linear: booting n0 (x.grid.umich.edu)
> n-1<7964> ssi:boot:rsh: starting lamd on (x.grid.umich.edu)
> n-1<7964> ssi:boot:rsh: starting on n0 (x.grid.umich.edu): hboot -t
> -c lam-conf.lamd -d -v -sessionsuffix pbs-3497.x.grid.umich.edu -I -H
> 141.211.23.234 -P 55040 -n 0 -o 0
> n-1<7964> ssi:boot:rsh: launching locally
> n-1<7964> ssi:boot:base:linear: Failed to boot n0 (x.grid.umich.edu)
> n-1<7964> ssi:boot:base:server: closing server socket
> n-1<7964> ssi:boot:base:linear: aborted!
> lamboot did NOT complete successfully

Note that the rsh module was chosen instead of the tm module -- this
seems to imply that the tm support was not built and included in your
LAM installation. Can't say this for sure without the other data (see
above), but it's one possible explanation.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/