LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2007-07-02 23:33:11


Unfortunately, it appears that the option --without-boot-tm does not
work with LAM. I've fixed that problem in Subversion, and will
create a new beta in the next couple of days with the fix. In the
mean time, the only real option is to remove the Torque library
available on your compile node, which shouldn't be a problem if you
aren't using Torque.

The default compile mode with LAM/MPI is to build static libraries.
The files Jeff is talking about will only exist if you enabled shared
libraries and disabled static libraries. That mode isn't quite as
well tested as the default, so I'd be hesitant to go that route.
With the default mode of static libraries, there's no special file
you can remove from an install to disable Torque support.

Hope this helps,

Brian

On Jul 2, 2007, at 4:16 AM, Jeff Squyres wrote:

> Yes, you are correct that libtorque is related to the tm boot SSI and
> the Torque queueing system. I think you want to use --without-boot-
> tm and that should de-activate the boot tm module. Failing that, you
> should be able to rm the $prefix/lib/lam/*boot_tm* files (that's from
> memory -- double check before rm'ing that!). It should be fairly
> obvious while files to remove -- there should be a .lo and a .la that
> have "boot" and "tm" in them. These are the TM plugins; if you
> remove them, LAM won't have any knowledge of the tm system and you
> should be fine.
>
>
> On Jun 26, 2007, at 11:21 PM, Jens.Klostermann_at_[hidden]
> wrote:
>
>> Somehow it didn't work with the attachments so here again without
>> them, but
>> harder to read
>>
>> I try to run lam-7.1.3 with infiniband. My configuration looks like:
>> ---------------------
>> ./configure
>> --prefix=/home/pub/OpenFOAM/OpenFOAM-1.4/src/lam-7.1.3/platforms/
>> linux64Gcc4DPOpt
>> --with-rpi-ib=/usr/ibgd/driver/infinihost --with-rpi=ib --enable-
>> shared
>> --disable-static --without-romio --without-mpi2cpp --without-
>> profiling
>> --without-fc --without-tm --with-boot=rsh --with-rsh=ssh -x
>> ---------------------
>>
>> This compiles without a problem, but unfortunately I can't switch
>> of the "SSI
>> boot: tm" module, can I?
>>
>> This can be seen by laminfo, which gives the following:
>> ---------------------
>> LAM/MPI: 7.1.3
>> Prefix:
>> /home/pub/OpenFOAM/OpenFOAM-1.4/src/lam-7.1.3/platforms/
>> linux64Gcc4DPOpt
>> Architecture: x86_64-unknown-linux-gnu
>> Configured by: klosterm
>> Configured on: Tue Jun 26 22:32:28 CEST 2007
>> Configure host: stokes
>> Memory manager: ptmalloc2
>> C bindings: yes
>> C++ bindings: no
>> Fortran bindings: no
>> C compiler: gcc
>> C++ compiler: g++
>> Fortran compiler: false
>> Fortran symbols: none
>> C profiling: no
>> C++ profiling: no
>> Fortran profiling: no
>> C++ exceptions: no
>> Thread support: yes
>> ROMIO support: no
>> IMPI support: no
>> Debug support: no
>> Purify clean: no
>> SSI boot: globus (API v1.1, Module v0.6)
>> SSI boot: rsh (API v1.1, Module v1.1)
>> SSI boot: slurm (API v1.1, Module v1.0)
>> SSI boot: tm (API v1.1, Module v1.1)
>> SSI coll: lam_basic (API v1.1, Module v7.1)
>> SSI coll: shmem (API v1.1, Module v1.0)
>> SSI coll: smp (API v1.1, Module v1.2)
>> SSI rpi: crtcp (API v1.1, Module v1.1)
>> SSI rpi: ib (API v1.1, Module v1.0)
>> SSI rpi: lamd (API v1.0, Module v7.1)
>> SSI rpi: sysv (API v1.0, Module v7.1)
>> SSI rpi: tcp (API v1.0, Module v7.1)
>> SSI rpi: usysv (API v1.0, Module v7.1)
>> SSI cr: self (API v1.0, Module v1.0)
>> ---------------------
>>
>>
>> So and here is my problem: lamboot is asking for libtorque.so.0,
>> which seem to
>> be related to the torque batch system?? Since our cluster doesn't
>> use any batch
>> system, I would like to switch off the tm-module (this is the
>> reason I used
>> --without-tm as an configure option, which did obviously not work):
>> ---------------------
>> lamboot -v -ssi boot rsh ./knotenliste_lam
>>
>> LAM 7.1.3 - Indiana University
>>
>> n-1<6016> ssi:boot:base:linear: booting n0 (stokes)
>> n-1<6016> ssi:boot:base:linear: booting n1 (node13)
>> ERROR: LAM/MPI unexpectedly received the following on stderr:
>> hboot: error while loading shared libraries: libtorque.so.0: cannot
>> open shared
>> object file: No such file or directory
>> ---------------------------------------------------------------------
>> -
>> -------
>> LAM failed to execute a LAM binary on the remote node "node13".
>> Since LAM was already able to determine your remote shell as "hboot",
>> it is probable that this is not an authentication problem.
>>
>> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
>> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
>> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI
>> USER'S
>> *** MAILING LIST.
>>
>> LAM tried to use the remote agent command "ssh"
>> to invoke the following command:
>>
>> ssh -x node13 -n hboot -t -c lam-conf.lamd -v -s -I '"-H
>> 139.20.53.201
>> -P 29989 -n 1 -o 0"'
>>
>> This can indicate several things. You should check the following:
>>
>> - The LAM binaries are in your $PATH
>> - You can run the LAM binaries
>> - The $PATH variable is set properly before your
>> .cshrc/.profile exits
>>
>> Try to invoke the command listed above manually at a Unix prompt.
>>
>> You will need to configure your local setup such that you will *not*
>> be prompted for a password to invoke this command on the remote node.
>> No output should be printed from the remote node before the output of
>> the command is displayed.
>>
>> When you can get this command to execute successfully by hand, LAM
>> will probably be able to function properly.
>> ---------------------------------------------------------------------
>> -
>> -------
>> n-1<6016> ssi:boot:base:linear: Failed to boot n1 (node13)
>> n-1<6016> ssi:boot:base:linear: aborted!
>> n-1<6022> ssi:boot:base:linear: booting n0 (stokes)
>> n-1<6022> ssi:boot:base:linear: booting n1 (node13)
>> ERROR: LAM/MPI unexpectedly received the following on stderr:
>> tkill: error while loading shared libraries: libtorque.so.0: cannot
>> open shared
>> object file: No such file or directory
>> ---------------------------------------------------------------------
>> -
>> -------
>> LAM failed to execute a LAM binary on the remote node "node13".
>> Since LAM was already able to determine your remote shell as "tkill",
>> it is probable that this is not an authentication problem.
>>
>> *** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
>> *** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
>> *** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI
>> USER'S
>> *** MAILING LIST.
>>
>> LAM tried to use the remote agent command "ssh"
>> to invoke the following command:
>>
>> ssh -x node13 -n tkill -v
>>
>> This can indicate several things. You should check the following:
>>
>> - The LAM binaries are in your $PATH
>> - You can run the LAM binaries
>> - The $PATH variable is set properly before your
>> .cshrc/.profile exits
>>
>> Try to invoke the command listed above manually at a Unix prompt.
>>
>> You will need to configure your local setup such that you will *not*
>> be prompted for a password to invoke this command on the remote node.
>> No output should be printed from the remote node before the output of
>> the command is displayed.
>>
>> When you can get this command to execute successfully by hand, LAM
>> will probably be able to function properly.
>> ---------------------------------------------------------------------
>> -
>> -------
>> n-1<6022> ssi:boot:base:linear: Failed to boot n1 (node13)
>> n-1<6022> ssi:boot:base:linear: aborted!
>> lamboot did NOT complete successfully
>> klosterm_at_stokes:/home/pub/infiniband/tests> ssh -x node13 -n tkill
>> tkill: error while loading shared libraries: libtorque.so.0: cannot
>> open shared
>> object file: No such file or directory
>> ---------------------
>>
>>
>> The funny thing is lamboot with just localhost works on the frontend:
>> --------------------
>> lamboot -v -ssi boot rsh
>>
>> LAM 7.1.3 - Indiana University
>>
>> n-1<7868> ssi:boot:base:linear: booting n0 (localhost)
>> n-1<7868> ssi:boot:base:linear: finished
>> --------------------
>>
>> but not on node 13:
>> --------------------
>> lamboot -v -ssi boot rsh
>> lamboot: error while loading shared libraries: libtorque.so.0:
>> cannot open
>> shared object file: No such file or directory
>> klosterm_at_node13:~>
>> klosterm_at_node13:~> LAM 7.1.3 - Indiana University
>> -bash: LAM: command not found
>> klosterm_at_node13:~>
>> klosterm_at_node13:~> n-1<7868> ssi:boot:base:linear: booting n0
>> (localhost)
>> -bash: syntax error near unexpected token `7868'
>> klosterm_at_node13:~> n-1<7868> ssi:boot:base:linear: finished
>> -bash: syntax error near unexpected token `7868'
>> --------------------
>>
>> Any help is appreciated.
>>
>> With regards Jens
>>
>>
>>
>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
   Brian Barrett
   LAM/MPI Developer
   Make today a LAM/MPI day!