LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Ole Holm Nielsen (Ole.H.Nielsen_at_[hidden])
Date: 2005-09-16 05:06:09


It turns out that the problem was with our Torque installation.
The lam-7.1.2b26 is now able to boot successfully using the "tm"
boot module under Torque 1.2.0p6.

The simplest test of the Torque/PBS "tm" system, without ever
invoking any MPI daemons, is to run the following command from
within a PBS batch job:

    pbsdsh hostname

This should simply list the node names allocated to your PBS job,
using the "tm" interface to connect to all nodes. If pbsdsh fails,
so should any LAM-MPI commands using the "tm" interface.

The discussion of the Torque problem can be read here:
http://www.supercluster.org/pipermail/torqueusers/2005-September/thread.html
The quick summary was that pbs_mom had the wrong path to the
pbs_demux executable built in, an error which came from the
building of RPMs.

Ole Holm Nielsen wrote:
> When we upgraded our test cluster from the Torque batch system
> version torque-1.2.0p4 to torque-1.2.0p6, parallel jobs using
> LAM-MPI beta version lam-7.1.2b22 would no longer boot the LAM
> daemons. I downloaded and rebuilt lam-7.1.2b26 with the new
> Torque libraries, but that didn't help any.
>
> The problem with Torque is specific to LAM-MPI (serial jobs run
> perfectly well). When LAM-MPI selects a boot schema in a Torque
> batch job, it defaults to the Torque/OpenPBS "tm" schema.
> Unfortunately, this tm schema is unable to boot correctly (see
> output below). If I force LAM-MPI to use the "rsh" boot schema
> (export LAM_MPI_SSI_boot_tm_priority=1), everything with LAM-MPI
> works just fine ! It is of course possible that LAM-MPI used
> to default to the "rsh" boot schema with torque-1.2.0p4, but we
> can't verify that any more.
>
> Question: Is Torque's LAM-MPI "tm" boot schema supposed to be
> working correctly with Torque ? I'd love to get it to
> work because of the performance improvements promised in the
> LAM-MPI documentation.

-- 
Ole Holm Nielsen
Department of Physics, Technical University of Denmark