LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2006-02-27 12:05:23


On Feb 27, 2006, at 11:40 AM, Jeffrey B. Layton wrote:

> Josh,
>
> I pass it the list of nodes from PBS using $PBS_NODEFILE.
> Is it looking for a default hostfile that is perhaps different
> than what I pass to lamboot?

Ah that makes more sense. If you are using PBS then lamboot will
automatically take care of accessing the $PBS_NODEFILE environment
variable and try to 'do the right thing'. So all you should have to
do is a 'lamboot' with no arguments while inside your allocation.

> Also, I don't see any etc subdirectories for the installation.
> Does this directory get installed be default? (I want to pass
> instructions along to the admin in case he needs to rebuild it).

This is your problem then. The etc directory is installed by default
by the LAM/MPI makefiles, but can be overwritten with the [--
sysconfdir=DIR] option to configure. You may want to ask the sysadmin
to send you the configure string they used, and possibly reinstall
LAM on those machines paying close attention to see if 'make install'
fails at any point. It could be that 'make install' failed before
creating the etc directory, and the sys admin didn't catch it.

Let me know if that helps.

-- Josh

>
> Thanks!
>
> Jeff
>
>> Jeff,
>>
>> It seems that the install is not quite complete or an environment
>> variable is set improperly.
>> As a sanity check, make sure you don't have $LAMHOME set on any
>> of the machines.
>>
>> The lamboot problem is likely due to not finding the default
>> hostfile [lam-bhost.def] (along with the helpfiles) on one of the
>> machines (o1). These are installed in $PREFIX/etc (by default /
>> usr/local/etc). I would look around on the machine to see if the
>> files [lam- bhost.def] and [lam-helpfile] are installed properly.
>> If they are in an odd directory (say /san/lam-7.1.1/etc), you
>> could try setting $LAMHOME to the root of that directory (/san/
>> lam-7.1.1), and see if that helps at all.
>>
>> As a temporary work around, you could see if lamboot works
>> properly with a local hostfile:
>> $ cat my-bhost.def
>> localhost
>> $ lamboot -v my-bhost.def
>>
>> -- Josh
>>
>> On Feb 27, 2006, at 10:46 AM, Jeffrey B. Layton wrote:
>>
>>> Hello,
>>>
>>> I'm trying to run a code built with PGI 6.0 and LAM-7.1.1
>>> on an Opteron system (SLES 9, SP2). The code builds
>>> correctly, but when I try to lamboot I get the following
>>> error message:
>>>
>>> n-1<29905> ssi:boot:base:linear: booting n0 (o1)
>>> base: cannot find process schema (null): No such file or directory
>>> --------------------------------------------------------------------
>>> -- -------
>>>
>>> *** Oops -- I cannot open the LAM help file.
>>> *** I tried looking for it in the following places:
>>> ***
>>> *** $HOME/lam-helpfile
>>> *** $HOME/lam-7.1.1-helpfile
>>> *** $HOME/etc/lam-helpfile
>>> *** $HOME/etc/lam-7.1.1-helpfile
>>> *** $LAMHELPDIR/lam-helpfile
>>> *** $LAMHELPDIR/lam-7.1.1-helpfile
>>> *** $LAMHOME/etc/lam-helpfile
>>> *** $LAMHOME/etc/lam-7.1.1-helpfile
>>> *** $SYSCONFDIR/lam-helpfile
>>> *** $SYSCONFDIR/lam-7.1.1-helpfile
>>> ***
>>> *** You were supposed to get help on the program "hboot"
>>> *** about the topic "cant-parse-config"
>>> ***
>>> *** Sorry!
>>> --------------------------------------------------------------------
>>> -- -------
>>>
>>>
>>>
>>> So I assume something is wrong and I try using recon to see what's
>>> going on. Here is the output from the first node:
>>>
>>>
>>> n-1<25563> ssi:boot:base:linear: booting n0 (o1)
>>> n-1<25563> ssi:boot:base:linear: Failed to boot n0 (o1)
>>> n-1<25563> ssi:boot:base:linear: aborted!
>>> --------------------------------------------------------------------
>>> -- -------
>>> *** Oops -- I cannot open the LAM help file.
>>> *** I tried looking for it in the following places:
>>> ***
>>> *** $HOME/lam-helpfile
>>> *** $HOME/lam-7.1.1-helpfile
>>> *** $HOME/etc/lam-helpfile
>>> *** $HOME/etc/lam-7.1.1-helpfile
>>> *** $LAMHELPDIR/lam-helpfile
>>> *** $LAMHELPDIR/lam-7.1.1-helpfile
>>> *** $LAMHOME/etc/lam-helpfile
>>> *** $LAMHOME/etc/lam-7.1.1-helpfile
>>> *** $SYSCONFDIR/lam-helpfile
>>> *** $SYSCONFDIR/lam-7.1.1-helpfile
>>> ***
>>> *** You were supposed to get help on the program "recon"
>>> *** about the topic "unhappiness"
>>> ***
>>> *** Sorry!
>>> --------------------------------------------------------------------
>>> -- -------
>>>
>>>
>>> I assume something is wrong with the installation. Any ideas?
>>> (I didn't do the build nor the installation).
>>>
>>> Thanks!
>>>
>>> Jeff
>>>
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>>
>> ----
>> Josh Hursey
>> jjhursey_at_[hidden]
>> http://www.lam-mpi.org/
>>

----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/