Hi Jeff,
Thanks for enlightening me. It is indeed due to inconsistent TMPDIR.
What I did to solve it is to export a fixed directory to the TMPDIR.
Best wishes,
Yan
On January 28 2008 at 9:09 AM, Jeff Squyres wrote:
The hostfile argument is ignored when you're using the TM boot SSI in
LAM. So it's not surprising that the same messages occur.
I wonder if $TMPDIR is different between the head node and your back-
end computing nodes, such that LAM trying to make a directory on the
back-end nodes can't because the $TMDIR doesn't exist on the back-end
nodes...? (or something like that)
Have you tried Open MPI?
On Jan 27, 2008, at 7:52 PM, Wu Yan wrote:
> The same error messages were returned.
>
> Cheers,
> Yan
>
>
> On Friday, January 25 2008 at 4:50 PM SCIPIONI Roberto wrote:
>
> Did you try using just
>
> lamboot $PBS_NODEFILE
>
> in your script
>
>
> Roberto Scipioni
> ICYS, NIMS
> Japan
>
>> Hi,
>> My LAM/MPI was compiled with the following flags:
>>
>> ./configure --prefix=/HOME02/snc/snc0602/lam7 --enable-shared --
>> with-modules
>> --with-trillium --with-rsh=ssh --with-fc=gfortran --with-boot-
>> tm=/opt/torque--disable-static
>>
>> Then I added:
>> export PATH=/HOME02/snc/snc0602/lam7/bin:$PATH
>> export LD_LIBRARY_PATH=/HOME02/snc/snc0602/lam7/lib:$LD_LIBRARY_PATH
>> to .bashrc
>>
>> Then I tried to run the following commands:
>> qsub -I -l nodes=4:ppn=1 -l walltime=1:00:00
>> cat $PBS_NODEFILE >> $PBS_O_WORKDIR/lam-bhost.def
>> cd $PBS_O_WORKDIR
>> lamboot -v lam.def
>>
>> The following error message was returned:
>>
>> LAM 7.1.4/MPI 2 C++/ROMIO - Indiana University
>>
>> n-1<2543> ssi:boot:base:linear_windowed: booting n0 (opto009)
>> mkdir: No such file or directory
>> n-1<2543> ssi:boot:base:linear_windowed: booting n1 (opto008)
>> mkdir: No such file or directory
>> chdir failed!: No such file or directory
>> mkdir: No such file or directory
>> mkdir: No such file or directory
>> chdir failed!: No such file or directory
>>
>> However, if I don't use qsub but ssh into that node and run
>> lamboot, no
>> error message was returned. Can anybody help me in this?
>> Thanks!
>>
>> Cheers,
>> Yan
>>
--
Jeff Squyres
Cisco Systems
|