download the tar.gz source
On Dec 1, 2004, at 10:46 AM, Susanne Hemker wrote:
> Hi Jeff,
> The only problem with that is, that my nodes are running red hat 7.3
> and the 7.1 rpms seem to be for red hat 9.
> Any other ideas ?
> Thanks,
> Susanne
>
>>>> jsquyres_at_[hidden] 12/01/04 09:48AM >>>
> Can you upgrade to a later version of LAM, such as 7.1.1? The 6.5.x
> series is actually no longer supported. The 7.1.x series contains a
> *LOT* more functionality and is source compatible with the 6.5.x
> series. It also has a lot more debugging output for diagnosing lamboot
>
> problems like this (i.e., the "lamboot -d" output is much more
> verbose).
>
>
> On Dec 1, 2004, at 6:47 AM, Susanne Hemker wrote:
>
>> Hi everybody,
>> I am trying to start lam, but it won't finish the boot process.
> Since
>> "recon -v boot_schema_file" did work fine, I have changed the boot
>> schema faile to the IP addresse instead of the node names,as
> recommende
>> in the FAQ, but this did not resolve the problem. Can anybody tell
> me
>> what I might have done wrong or need to change in order to get lam
> to
>> boot?
>>
>> Here's the output from the boot attempt:
>>
>> n65(15)% lamboot -d -v boot_schema_file
>>
>> LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University
>>
>> lamboot: boot schema file: boot_schema_file
>> lamboot: opening hostfile boot_schema_file
>> lamboot: found the following hosts:
>> lamboot: n0 192.168.0.65
>> lamboot: n1 192.168.0.66
>> lamboot: n2 192.168.0.67
>> lamboot: n3 192.168.0.68
>> lamboot: resolved hosts:
>> lamboot: n0 192.168.0.65 --> 192.168.0.65
>> lamboot: n1 192.168.0.66 --> 192.168.0.66
>> lamboot: n2 192.168.0.67 --> 192.168.0.67
>> lamboot: n3 192.168.0.68 --> 192.168.0.68
>> lamboot: found 4 host node(s)
>> lamboot: origin node is 0 (192.168.0.65)
>> Executing hboot on n0 (192.168.0.65 - 1 CPU)...
>> lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I "
> -H
>> 192.168.0.65 -P 32807 -n 0 -o 0 ""
>> hboot: process schema = "/opt/lam-6.5.7/etc/lam-conf.lam"
>> hboot: found /opt/lam-6.5.7/bin/lamd
>> hboot: performing tkill
>> hboot: tkill
>> hboot: booting...
>> hboot: fork /opt/lam-6.5.7/bin/lamd
>> [1] 1628 lamd -H 192.168.0.65 -P 32807 -n 0 -o 0 -d
>> hboot: attempting to execute
>> Executing hboot on n1 (192.168.0.66 - 1 CPU)...
>> lamboot: attempting to execute "/usr/bin/ssh 192.168.0.66 -n echo
>> $SHELL"
>> lamboot: got remote shell /bin/tcsh
>> lamboot: attempting to execute "/usr/bin/ssh 192.168.0.66 -n hboot
> -t
>> -c lam-conf.lam -d -v -s -I "-H 192.168.0.65 -P 32807 -n 1 -o 0
> ""
>> hboot: process schema = "/opt/lam-6.5.7/etc/lam-conf.lam"
>> hboot: found /opt/lam-6.5.7/bin/lamd
>> hboot: performing tkill
>> hboot: tkill
>> hboot: booting...
>> hboot: fork /opt/lam-6.5.7/bin/lamd
>> [1] 25996 lamd -H 192.168.0.65 -P 32807 -n 1 -o 0 -d
>>
> -----------------------------------------------------------------------
>
>> ------
>> lamboot encountered some error (see above) during the boot process,
>> and will now attempt to kill all nodes that it was previously able
> to
>> boot (if any).
>>
>> Please wait for LAM to finish; if you interrupt this process, you
> may
>> have LAM daemons still running on remote nodes.
>>
> -----------------------------------------------------------------------
>
>> ------
>> wipe ...
>>
>> LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University
>>
>> Executing tkill on n0 (192.168.0.65)...
>> Executing tkill on n1 (192.168.0.66)...
>> lamboot did NOT complete successfully
>>
>>
>> Thanks for any suggestions,
>>
>> Susanne
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|