LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Ole Holm Nielsen (Ole.H.Nielsen_at_[hidden])
Date: 2005-09-15 09:19:39


Jeff Squyres wrote:
>> Question: Is Torque's LAM-MPI "tm" boot schema supposed to be
>> working correctly with Torque ? I'd love to get it to
>> work because of the performance improvements promised in the
>> LAM-MPI documentation.
>
> Yes, it is. I just upgraded our test cluster from 1.2.0p1 to 1.2.0p6
> to ensure that nothing broke, and it seems to work fine for me.
>
> One thing to check is to ensure that your new LAM installation is in
> your $PATH and that it is installed on all nodes. For example, is
> /usr/local/lam-7.1.2-pgi the correct directory, and updated with the
> new version on all of your nodes?

Well, it seems to be OK. The /usr/local/lam-7.1.2-pgi directory
tree is rsync'ed from the central server, and the files therein
have identical timestamps and sizes on all nodes.

My $PATH appears to be OK. Also, when recon executes it picks up
/usr/local/lam-7.1.2-pgi/bin/tkill (the path is correct) on the
master node (see my previous mail). I don't know if the correct
$PATH is set on the slave nodes when LAM boots with the "tm" schema
  - is there a way to check that ? In our setup the user's .cshrc
file is responsible for setting LAMHOME and PATH to point to
/usr/local/lam-7.1.2-pgi.

LAM-MPI works with the "rsh" boot schema on the same test cluster,
so the problem seems to be specific to the "tm" boot schema.
The funny thing is that the problem cropped up after I updated
the Torque version. With torque-1.2.0p4 things were just fine
(but maybe the default schema was "rsh" back then, we don't know...).

Thanks,
Ole