LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-12-01 22:06:47


I would agree with Glen: download a tarball and compile LAM/MPI from
source. We take a lot of effort to make compilation and installation
of LAM/MPI to be as simple as possible. Check out the official LAM/MPI
Installation Guide (both in the tarball and on the web site). Don't be
alarmed at its length -- there's "quick start" sections if you know
what you're doing. The bulk of the manual is laboriously explaining
each command line option, etc. So there's very few people on the
planet besides me who have read it cover-to-cover. :-)

On Dec 1, 2004, at 8:46 AM, Susanne Hemker wrote:

> Hi Jeff,
> The only problem with that is, that my nodes are running red hat 7.3
> and the 7.1 rpms seem to be for red hat 9.
> Any other ideas ?
> Thanks,
> Susanne
>
>>>> jsquyres_at_[hidden] 12/01/04 09:48AM >>>
> Can you upgrade to a later version of LAM, such as 7.1.1? The 6.5.x
> series is actually no longer supported. The 7.1.x series contains a
> *LOT* more functionality and is source compatible with the 6.5.x
> series. It also has a lot more debugging output for diagnosing lamboot
>
> problems like this (i.e., the "lamboot -d" output is much more
> verbose).
>
>
> On Dec 1, 2004, at 6:47 AM, Susanne Hemker wrote:
>
>> Hi everybody,
>> I am trying to start lam, but it won't finish the boot process.
> Since
>> "recon -v boot_schema_file" did work fine, I have changed the boot
>> schema faile to the IP addresse instead of the node names,as
> recommende
>> in the FAQ, but this did not resolve the problem. Can anybody tell
> me
>> what I might have done wrong or need to change in order to get lam
> to
>> boot?
>>
>> Here's the output from the boot attempt:
>>
>> n65(15)% lamboot -d -v boot_schema_file
>>
>> LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University
>>
>> lamboot: boot schema file: boot_schema_file
>> lamboot: opening hostfile boot_schema_file
>> lamboot: found the following hosts:
>> lamboot: n0 192.168.0.65
>> lamboot: n1 192.168.0.66
>> lamboot: n2 192.168.0.67
>> lamboot: n3 192.168.0.68
>> lamboot: resolved hosts:
>> lamboot: n0 192.168.0.65 --> 192.168.0.65
>> lamboot: n1 192.168.0.66 --> 192.168.0.66
>> lamboot: n2 192.168.0.67 --> 192.168.0.67
>> lamboot: n3 192.168.0.68 --> 192.168.0.68
>> lamboot: found 4 host node(s)
>> lamboot: origin node is 0 (192.168.0.65)
>> Executing hboot on n0 (192.168.0.65 - 1 CPU)...
>> lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I "
> -H
>> 192.168.0.65 -P 32807 -n 0 -o 0 ""
>> hboot: process schema = "/opt/lam-6.5.7/etc/lam-conf.lam"
>> hboot: found /opt/lam-6.5.7/bin/lamd
>> hboot: performing tkill
>> hboot: tkill
>> hboot: booting...
>> hboot: fork /opt/lam-6.5.7/bin/lamd
>> [1] 1628 lamd -H 192.168.0.65 -P 32807 -n 0 -o 0 -d
>> hboot: attempting to execute
>> Executing hboot on n1 (192.168.0.66 - 1 CPU)...
>> lamboot: attempting to execute "/usr/bin/ssh 192.168.0.66 -n echo
>> $SHELL"
>> lamboot: got remote shell /bin/tcsh
>> lamboot: attempting to execute "/usr/bin/ssh 192.168.0.66 -n hboot
> -t
>> -c lam-conf.lam -d -v -s -I "-H 192.168.0.65 -P 32807 -n 1 -o 0
> ""
>> hboot: process schema = "/opt/lam-6.5.7/etc/lam-conf.lam"
>> hboot: found /opt/lam-6.5.7/bin/lamd
>> hboot: performing tkill
>> hboot: tkill
>> hboot: booting...
>> hboot: fork /opt/lam-6.5.7/bin/lamd
>> [1] 25996 lamd -H 192.168.0.65 -P 32807 -n 1 -o 0 -d
>>
> -----------------------------------------------------------------------
>
>> ------
>> lamboot encountered some error (see above) during the boot process,
>> and will now attempt to kill all nodes that it was previously able
> to
>> boot (if any).
>>
>> Please wait for LAM to finish; if you interrupt this process, you
> may
>> have LAM daemons still running on remote nodes.
>>
> -----------------------------------------------------------------------
>
>> ------
>> wipe ...
>>
>> LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University
>>
>> Executing tkill on n0 (192.168.0.65)...
>> Executing tkill on n1 (192.168.0.66)...
>> lamboot did NOT complete successfully
>>
>>
>> Thanks for any suggestions,
>>
>> Susanne
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/