On Mon, 15 Nov 2004, Jeff Squyres wrote:
> It looks like our FAQ needs to be updated. I think that question was written
> in pre-TM-boot-module days.
>
> However, even if you specify $PBS_NODEFILE with lamboot, the tm boot SSI will
> ignore it. So it's harmless. You might want to read the TM boot module
> writeup in the LAM User's Guide. This is perhaps the most definitive source
> of information for users.
Thanks.
> You did not answer my question, however:
>
>>> Question: are you invoking lamboot in a PBS (Torque) job? The TM boot
>>> module
>>> will only work when you launch lamboot from *inside* a PBS (Torque) job.
Sorry. I thought that "between the lines" it was clear that it's all my
fault :) -- I really tried to run 'lamboot -ssi boot tm' from outside a
PBS job.
>
> The fact that you are trying to use $PBS_NODEFILE implies that you're inside
> a PBS job, but the error message that you sent in your previous post:
>
>>>> n-1<28345> ssi:boot:tm: not running under PBS
>
> implies that you were trying from outside a PBS job (which is why the tm boot
> module disqualified itself). Have you tried lamboot -d from *inside* a PBS
> job?
No, I haven't. I did run 'lamboot -ssi boot tm' just from the command
line.
Regards,
Konstantin
>
> On Nov 14, 2004, at 10:23 PM, Konstantin Skaburskas wrote:
>
>>
>> Thank you for the answer.
>>
>> I also found this link to FAQ with the proper description:
>> http://www.lam-mpi.org/faq/category12.php3#question3
>>
>> and read very instructive article:
>> http://www.lam-mpi.org/papers/hpcs2003/tm-implementation.pdf
>>
>> In the example PBS script FAQ advises to use $PBS_NODEFILE variable to
>> specify a hostfile while the article says: "... A host file is not
>> needed, as LAM will obtain a list of target hosts from PBS." - 4.1 The
>> TM Boot Module, page 4.
>>
>>
>> Konstantin
>>
>> On Sat, 13 Nov 2004, Jeff Squyres wrote:
>>
>>> In terms of compiling and installing, it all looks good that the TM boot
>>> module was installed properly.
>>>
>>> So the question is: why isn't it available at run-time?
>>>
>>> Question: are you invoking lamboot in a PBS (Torque) job? The TM boot
>>> module
>>> will only work when you launch lamboot from *inside* a PBS (Torque) job.
>>>
>>>
>>>
>>> On Nov 11, 2004, at 4:32 PM, Konstantin Skaburskas wrote:
>>>
>>>> Hi,
>>>>
>>>> First, I compiled, installed Torque 1.1.0p4 to /usr/local, configured and
>>>> run server and mom. Then configured LAM 7.1.1 with
>>>>
>>>> CC=icc
>>>> CXX=icc
>>>> FC=ifort
>>>> export CC CXX FC
>>>> ../lam-7.1.1/configure \
>>>> --prefix=/usr/local/lam-7.1.1_intel_tm \
>>>> --with-trillium \
>>>> --with-prefix-memcpy \
>>>> --with-debug \
>>>> --with-tv-debug \
>>>> --enable-tv-dll-force \
>>>> --enable-shared \
>>>> --with-purify \
>>>> --with-rsh="ssh -x" \
>>>> --with-boot-tm=/usr/local
>>>>
>>>> After LAM compilation it seems that TM module was compiled and got into
>>>> static and shared libs (I can see ssi_boot_tm*.o in
>>>> share/ssi/boot/tm/src/,
>>>> /lam/install/dir/lib/liblam.a and 'nm' gives 'T' and 'D' for
>>>> lam_ssi_boot_tm* and tm_* in liblam.so). However, when I try to run
>>>> lamboot
>>>> with TM module I get:
>>>>
>>>>> lamboot -d -ssi boot tm
>>>> n-1<28345> ssi:boot:open: opening
>>>> n-1<28345> ssi:boot:open: looking for boot module named tm
>>>> n-1<28345> ssi:boot:open: opening boot module tm
>>>> n-1<28345> ssi:boot:open: opened boot module tm
>>>> n-1<28345> ssi:boot:select: initializing boot module tm
>>>> n-1<28345> ssi:boot:tm: not running under PBS
>>>> n-1<28345> ssi:boot:select: boot module not available: tm
>>>> n-1<28345> ssi:boot:select: no boot moduless available!
>>>> -----------------------------------------------------------------------
>>>> ------
>>>> No SSI boot modules said that they were available to run. This should
>>>> not happen.
>>>> -----------------------------------------------------------------------
>>>> ------
>> ...
>>>>
>>>> Thank you in advance,
>>>> Konstantin
>>>>
>>>> <conf.tar.gz>_______________________________________________
>>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>> --
>>> {+} Jeff Squyres
>>> {+} jsquyres_at_[hidden]
>>> {+} http://www.lam-mpi.org/
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>>
|