This output confirms that tm build properly and is integrated into your
LAM/MPI installation.
The problem appears to be here in the output from lamboot:
n-1<794> ssi:boot:tm: starting wipe on (x.grid.umich.edu)
Can't find executable for tkill
"tkill" is one of the LAM executables. If it can't be found, lamboot
is going to abort (and it did).
However, I can't figure out how lamboot would be found but tkill would
not (they should be in the same directory). Is LAM's installation
directory in your PATH? (specifically, $prefix/bin) This is going to
sound dumb, but you can verify that both tkill and lamboot exist in the
same directory and are both executable by you?
Specifically, put a "which lamboot" and "which tkill" at the top of
your PBS script -- let's triple check that you're getting all the
"right" executables. What the heck -- put a "laminfo" in there, too --
we can verify that you're finding the right laminfo, etc.
On May 27, 2005, at 5:01 PM, Beth Kirschner wrote:
> Thanks for the quick reply. In regards to your last comment (that the
> rsh module was chosen and not the lm module), I had sent you the
> output from my build _without_ the tm module -- so this is to be
> expected.
>
> I've attached a compressed tarball with the following files:
>
> config.log -- lam-7.1.1/config.log
> configure.out -- output of configure command (with tm module)
> laminfo.txt -- output of laminfo
> mpi.err -- output from trying to run lamboot
> tm.config.log -- lam-7.1.1/share/ssi/boot/tm/config.log
>
> Thanks in advance for any help,
> - Beth
>
> Jeff Squyres wrote:
>
>> On May 27, 2005, at 9:29 AM, Beth Kirschner wrote:
>>
>>> I'm having trouble getting 'lamboot' to execute from within a PBS
>>> script on a Mac OSX box. It runs fine without PBS. Has anyone else
>>> had success with this?
>>
>>
>> I *think* that we have tested this (PBS/Torque on OSX), but I can't
>> swear to it. Hypothetically, it *should* be the same as it is on
>> Linux -- there really shouldn't be any difficulties with this. If
>> there are, it's a bug that we should fix.
>>
>>> I've tried building Lam 7.1.1 in two configurations:
>>>
>>> # configure --prefix=/usr/local/lam-7.1.1 -with-rsh="ssh -x"
>>> --without-fc
>>> # configure --prefix=/usr/local/lam-7.1.1 -with-rsh="ssh -x"
>>> --without-fc --with-boot=tm --with-boot-tm=/usr/local/pbs
>>
>>
>> Can you send the output of the latter? I'd like to see the full
>> output of the configure including the --with-boot... switches (please
>> compress). Also send the corresponding config.log file, and
>> share/ssi/boot/tm/config.log.
>>
>> You can also check to ensure that the TM support built properly by
>> running the laminfo command. It will show you all the modules that
>> were built into LAM. If the "tm" boot module is not listed, then the
>> PBS/Torque support did not build properly.
>>
>> If it did not build properly, the output from configure should shed
>> light on the reason why (the determination of whether to build a
>> given module or not is made during configure).
>>
>>> Here's the script I've been running:
>>>
>>> #PBS -l nodes=1:ppn=2
>>> /usr/local/lam-7.1.1/bin/lamboot -d -v ${PBS_NODEFILE}
>>>
>>> Here's some of the output:
>>>
>>> n-1<7964> ssi:boot:base:server: opened port 55040
>>> n-1<7964> ssi:boot:base:linear: booting n0 (x.grid.umich.edu)
>>> n-1<7964> ssi:boot:rsh: starting lamd on (x.grid.umich.edu)
>>> n-1<7964> ssi:boot:rsh: starting on n0 (x.grid.umich.edu): hboot
>>> -t -c lam-conf.lamd -d -v -sessionsuffix pbs-3497.x.grid.umich.edu
>>> -I -H 141.211.23.234 -P 55040 -n 0 -o 0
>>> n-1<7964> ssi:boot:rsh: launching locally
>>> n-1<7964> ssi:boot:base:linear: Failed to boot n0
>>> (x.grid.umich.edu)
>>> n-1<7964> ssi:boot:base:server: closing server socket
>>> n-1<7964> ssi:boot:base:linear: aborted!
>>> lamboot did NOT complete successfully
>>
>>
>> Note that the rsh module was chosen instead of the tm module -- this
>> seems to imply that the tm support was not built and included in your
>> LAM installation. Can't say this for sure without the other data
>> (see above), but it's one possible explanation.
>>
> <lam.tar.gz>_______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|