LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Korambath, Prakashan (ppk_at_[hidden])
Date: 2003-12-05 17:09:30


Hi,

  I am finding following error while running the command lamboot -d
hostfile command through a sun grid engine queue script on a linux
cluster. The lam version is 7.0.2. Below is the error message. At the
command line I can execute lamboot with out any problem. I think it has
got something to do with lam adding some extra options to Sun grid
engine. Thanks for your hep in resolving this issue.

Prakashan Korambath
ATS, UCLA

/work/29416.1.p02-30m/nodefile
i02.bwc.ats.ucla.edu
i02.bwc.ats.ucla.edu
n0<32556> ssi:boot: Opening
n0<32556> ssi:boot: opening module globus
n0<32556> ssi:boot: initializing module globus
n0<32556> ssi:boot:globus: globus-job-run not found, globus boot will
not run n0<32556> ssi:boot: module not available: globus n0<32556>
ssi:boot: opening module rsh n0<32556> ssi:boot: initializing module rsh
n0<32556> ssi:boot:rsh: module initializing n0<32556>
ssi:boot:rsh:agent: ssh n0<32556> ssi:boot:rsh:username: <same>
n0<32556> ssi:boot:rsh:verbose: 1000 n0<32556> ssi:boot:rsh:algorithm:
linear n0<32556> ssi:boot:rsh:priority: 10 n0<32556> ssi:boot: module
available: rsh, priority: 10 n0<32556> ssi:boot: finalizing module
globus n0<32556> ssi:boot:globus: finalizing n0<32556> ssi:boot: closing
module globus n0<32556> ssi:boot: Selected boot module rsh n0<32556>
ssi:boot:base: looking for boot schema in following directories:
n0<32556> ssi:boot:base: $TROLLIUSHOME/etc
n0<32556> ssi:boot:base: $LAMHOME/etc
n0<32556> ssi:boot:base: /u/local/mpi/mpi-lam.7.0.2/etc
n0<32556> ssi:boot:base: looking for boot schema file:
n0<32556> ssi:boot:base: /work/29416.1.p02-30m/nodefile
n0<32556> ssi:boot:base: found boot schema:
/work/29416.1.p02-30m/nodefile n0<32556> ssi:boot:rsh: found the
following hosts:
n0<32556> ssi:boot:rsh: n0 i02.bwc.ats.ucla.edu (cpu=2)
n0<32556> ssi:boot:rsh: resolved hosts:
n0<32556> ssi:boot:rsh: n0 i02.bwc.ats.ucla.edu --> 10.10.64.2
(origin)
n0<32556> ssi:boot:rsh: starting RTE procs
n0<32556> ssi:boot:base:linear: starting
n0<32556> ssi:boot:base:server: opening server TCP socket n0<32556>
ssi:boot:base:server: opened port 55218 n0<32556> ssi:boot:base:linear:
booting n0 (i02.bwc.ats.ucla.edu) n0<32556> ssi:boot:rsh: starting lamd
on (i02.bwc.ats.ucla.edu) n0<32556> ssi:boot:rsh: starting on n0
(i02.bwc.ats.ucla.edu): hboot -t -c lam-conf.lamd -d -sessionsuffix
sge-29416-0 -I -H 10.10.64.2 -P 55218 -n 0 -o 0 n0<32556> ssi:boot:rsh:
launching locally

LAM 7.0.2/MPI 2 C++/ROMIO - Indiana University

n0<32556> ssi:boot:base:linear: Failed to boot n0 (i02.bwc.ats.ucla.edu)
n0<32556> ssi:boot:base:server: closing server socket n0<32556>
ssi:boot:base:linear: aborted!
------------------------------------------------------------------------
-----
lamboot encountered some error (see above) during the boot process, and
will now attempt to kill all nodes that it was previously able to boot
(if any).

Please wait for LAM to finish; if you interrupt this process, you may
have LAM daemons still running on remote nodes.
------------------------------------------------------------------------
-----
lamboot did NOT complete successfully

LAM 7.0.2/MPI 2 C++/ROMIO - Indiana University

lamboot: wipe -- nothing to do