LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2005-06-22 08:40:46


On Jun 21, 2005, at 11:25 AM, Ben Boxman wrote:

>    I'm trying to set up Torque+LAM to run on a small 8 node
> (2xOpteron) cluster and I'm experiencing some difficulties with
> getting LAM to run under Torque. Torque seems to be running fine and
> LAM will work on a few nodes without Torque.
>  
>   Specifically, when I attempt to run lamboot under a torque job
> (whether interactively or in a pbs script) it fails. Lamboot will
> succeed if run on a single node under a Torque job but will fail for
> any node number greater than 1.
>
> n-1<4121> ssi:boot:base:linear_windowed: booting n0
> (wild2.camero-tech.com)
> n-1<4121> ssi:boot:tm: starting wipe on (wild2.camero-tech.com)
> n-1<4121> ssi:boot:tm: starting on n0 (wild2.camero-tech.com):
> /usr/bin/tkill -setsid -d
> n-1<4121> ssi:boot:tm: successfully launched on n0
> (wild2.camero-tech.com)
> n-1<4121> ssi:boot:tm: waiting for completion on n0
> (wild2.camero-tech.com)
> n-1<4121> ssi:boot:base:linear_windowed: Failed to boot n0
> (wild2.camero-tech.com)
> n-1<4121> ssi:boot:base:linear_windowed: finished launching
> n-1<4121> ssi:boot:base:server: closing server socket
> n-1<4121> ssi:boot:base:linear_windowed: aborted!
> lamboot did NOT complete successfully

Do you have LAM installed in the same place on all the nodes in your
cluster? TM doesn't do any shell evaluation, so LAM requires that it
be installed in the same place on each node. It looks like on wild2.,
LAM isn't finding any of it's applications. This is just a guess,
because there isn't much information, but that's what it looks like.
If that isn't it, could you look in the PBS log files and see if there
is any useful information in there?

Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/