Thanks for the quick reply. In regards to your last comment (that the
rsh module was chosen and not the lm module), I had sent you the output
from my build _without_ the tm module -- so this is to be expected.
I've attached a compressed tarball with the following files:
config.log -- lam-7.1.1/config.log
configure.out -- output of configure command (with tm module)
laminfo.txt -- output of laminfo
mpi.err -- output from trying to run lamboot
tm.config.log -- lam-7.1.1/share/ssi/boot/tm/config.log
Thanks in advance for any help,
- Beth
Jeff Squyres wrote:
> On May 27, 2005, at 9:29 AM, Beth Kirschner wrote:
>
>> I'm having trouble getting 'lamboot' to execute from within a PBS
>> script on a Mac OSX box. It runs fine without PBS. Has anyone else
>> had success with this?
>
>
> I *think* that we have tested this (PBS/Torque on OSX), but I can't
> swear to it. Hypothetically, it *should* be the same as it is on
> Linux -- there really shouldn't be any difficulties with this. If
> there are, it's a bug that we should fix.
>
>> I've tried building Lam 7.1.1 in two configurations:
>>
>> # configure --prefix=/usr/local/lam-7.1.1 -with-rsh="ssh -x"
>> --without-fc
>> # configure --prefix=/usr/local/lam-7.1.1 -with-rsh="ssh -x"
>> --without-fc --with-boot=tm --with-boot-tm=/usr/local/pbs
>
>
> Can you send the output of the latter? I'd like to see the full
> output of the configure including the --with-boot... switches (please
> compress). Also send the corresponding config.log file, and
> share/ssi/boot/tm/config.log.
>
> You can also check to ensure that the TM support built properly by
> running the laminfo command. It will show you all the modules that
> were built into LAM. If the "tm" boot module is not listed, then the
> PBS/Torque support did not build properly.
>
> If it did not build properly, the output from configure should shed
> light on the reason why (the determination of whether to build a given
> module or not is made during configure).
>
>> Here's the script I've been running:
>>
>> #PBS -l nodes=1:ppn=2
>> /usr/local/lam-7.1.1/bin/lamboot -d -v ${PBS_NODEFILE}
>>
>> Here's some of the output:
>>
>> n-1<7964> ssi:boot:base:server: opened port 55040
>> n-1<7964> ssi:boot:base:linear: booting n0 (x.grid.umich.edu)
>> n-1<7964> ssi:boot:rsh: starting lamd on (x.grid.umich.edu)
>> n-1<7964> ssi:boot:rsh: starting on n0 (x.grid.umich.edu): hboot
>> -t -c lam-conf.lamd -d -v -sessionsuffix pbs-3497.x.grid.umich.edu -I
>> -H 141.211.23.234 -P 55040 -n 0 -o 0
>> n-1<7964> ssi:boot:rsh: launching locally
>> n-1<7964> ssi:boot:base:linear: Failed to boot n0 (x.grid.umich.edu)
>> n-1<7964> ssi:boot:base:server: closing server socket
>> n-1<7964> ssi:boot:base:linear: aborted!
>> lamboot did NOT complete successfully
>
>
> Note that the rsh module was chosen instead of the tm module -- this
> seems to imply that the tm support was not built and included in your
> LAM installation. Can't say this for sure without the other data (see
> above), but it's one possible explanation.
>
|