LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-11-15 06:52:03


It looks like our FAQ needs to be updated. I think that question was
written in pre-TM-boot-module days.

However, even if you specify $PBS_NODEFILE with lamboot, the tm boot
SSI will ignore it. So it's harmless. You might want to read the TM
boot module writeup in the LAM User's Guide. This is perhaps the most
definitive source of information for users.

You did not answer my question, however:

>> Question: are you invoking lamboot in a PBS (Torque) job? The TM
>> boot module
>> will only work when you launch lamboot from *inside* a PBS (Torque)
>> job.

The fact that you are trying to use $PBS_NODEFILE implies that you're
inside a PBS job, but the error message that you sent in your previous
post:

>>> n-1<28345> ssi:boot:tm: not running under PBS

implies that you were trying from outside a PBS job (which is why the
tm boot module disqualified itself). Have you tried lamboot -d from
*inside* a PBS job?

On Nov 14, 2004, at 10:23 PM, Konstantin Skaburskas wrote:

>
> Thank you for the answer.
>
> I also found this link to FAQ with the proper description:
> http://www.lam-mpi.org/faq/category12.php3#question3
>
> and read very instructive article:
> http://www.lam-mpi.org/papers/hpcs2003/tm-implementation.pdf
>
> In the example PBS script FAQ advises to use $PBS_NODEFILE variable to
> specify a hostfile while the article says: "... A host file is not
> needed, as LAM will obtain a list of target hosts from PBS." - 4.1 The
> TM Boot Module, page 4.
>
>
> Konstantin
>
> On Sat, 13 Nov 2004, Jeff Squyres wrote:
>
>> In terms of compiling and installing, it all looks good that the TM
>> boot
>> module was installed properly.
>>
>> So the question is: why isn't it available at run-time?
>>
>> Question: are you invoking lamboot in a PBS (Torque) job? The TM
>> boot module
>> will only work when you launch lamboot from *inside* a PBS (Torque)
>> job.
>>
>>
>>
>> On Nov 11, 2004, at 4:32 PM, Konstantin Skaburskas wrote:
>>
>>> Hi,
>>>
>>> First, I compiled, installed Torque 1.1.0p4 to /usr/local,
>>> configured and
>>> run server and mom. Then configured LAM 7.1.1 with
>>>
>>> CC=icc
>>> CXX=icc
>>> FC=ifort
>>> export CC CXX FC
>>> ../lam-7.1.1/configure \
>>> --prefix=/usr/local/lam-7.1.1_intel_tm \
>>> --with-trillium \
>>> --with-prefix-memcpy \
>>> --with-debug \
>>> --with-tv-debug \
>>> --enable-tv-dll-force \
>>> --enable-shared \
>>> --with-purify \
>>> --with-rsh="ssh -x" \
>>> --with-boot-tm=/usr/local
>>>
>>> After LAM compilation it seems that TM module was compiled and got
>>> into
>>> static and shared libs (I can see ssi_boot_tm*.o in
>>> share/ssi/boot/tm/src/,
>>> /lam/install/dir/lib/liblam.a and 'nm' gives 'T' and 'D' for
>>> lam_ssi_boot_tm* and tm_* in liblam.so). However, when I try to run
>>> lamboot
>>> with TM module I get:
>>>
>>>> lamboot -d -ssi boot tm
>>> n-1<28345> ssi:boot:open: opening
>>> n-1<28345> ssi:boot:open: looking for boot module named tm
>>> n-1<28345> ssi:boot:open: opening boot module tm
>>> n-1<28345> ssi:boot:open: opened boot module tm
>>> n-1<28345> ssi:boot:select: initializing boot module tm
>>> n-1<28345> ssi:boot:tm: not running under PBS
>>> n-1<28345> ssi:boot:select: boot module not available: tm
>>> n-1<28345> ssi:boot:select: no boot moduless available!
>>> ---------------------------------------------------------------------
>>> --
>>> ------
>>> No SSI boot modules said that they were available to run. This
>>> should
>>> not happen.
>>> ---------------------------------------------------------------------
>>> --
>>> ------
> ...
>>>
>>> Thank you in advance,
>>> Konstantin
>>>
>>> <conf.tar.gz>_______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>> --
>> {+} Jeff Squyres
>> {+} jsquyres_at_[hidden]
>> {+} http://www.lam-mpi.org/
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> -----------------------------------------------------------------------
> - Konstantin Skaburskas, Ph.D. student konstan_at_[hidden] -
> - Inst. of Experimental Physics & Technology http://www.ut.ee -
> - University of Tartu phone: +372 7 374843 -
> - Tahe 4-234, Tartu, 51010, Estonia fax: +372 7 375858 -
> -----------------------------------------------------------------------
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/