LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Susanne Hemker (shemker2_at_[hidden])
Date: 2004-12-01 10:46:47


Hi Jeff,
The only problem with that is, that my nodes are running red hat 7.3
and the 7.1 rpms seem to be for red hat 9.
Any other ideas ?
Thanks,
Susanne

>>> jsquyres_at_[hidden] 12/01/04 09:48AM >>>
Can you upgrade to a later version of LAM, such as 7.1.1? The 6.5.x
series is actually no longer supported. The 7.1.x series contains a
*LOT* more functionality and is source compatible with the 6.5.x
series. It also has a lot more debugging output for diagnosing lamboot
 
problems like this (i.e., the "lamboot -d" output is much more
verbose).

On Dec 1, 2004, at 6:47 AM, Susanne Hemker wrote:

> Hi everybody,
> I am trying to start lam, but it won't finish the boot process.
Since
> "recon -v boot_schema_file" did work fine, I have changed the boot
> schema faile to the IP addresse instead of the node names,as
recommende
> in the FAQ, but this did not resolve the problem. Can anybody tell
me
> what I might have done wrong or need to change in order to get lam
to
> boot?
>
> Here's the output from the boot attempt:
>
> n65(15)% lamboot -d -v boot_schema_file
>
> LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University
>
> lamboot: boot schema file: boot_schema_file
> lamboot: opening hostfile boot_schema_file
> lamboot: found the following hosts:
> lamboot: n0 192.168.0.65
> lamboot: n1 192.168.0.66
> lamboot: n2 192.168.0.67
> lamboot: n3 192.168.0.68
> lamboot: resolved hosts:
> lamboot: n0 192.168.0.65 --> 192.168.0.65
> lamboot: n1 192.168.0.66 --> 192.168.0.66
> lamboot: n2 192.168.0.67 --> 192.168.0.67
> lamboot: n3 192.168.0.68 --> 192.168.0.68
> lamboot: found 4 host node(s)
> lamboot: origin node is 0 (192.168.0.65)
> Executing hboot on n0 (192.168.0.65 - 1 CPU)...
> lamboot: attempting to execute "hboot -t -c lam-conf.lam -d -v -I "
-H
> 192.168.0.65 -P 32807 -n 0 -o 0 ""
> hboot: process schema = "/opt/lam-6.5.7/etc/lam-conf.lam"
> hboot: found /opt/lam-6.5.7/bin/lamd
> hboot: performing tkill
> hboot: tkill
> hboot: booting...
> hboot: fork /opt/lam-6.5.7/bin/lamd
> [1] 1628 lamd -H 192.168.0.65 -P 32807 -n 0 -o 0 -d
> hboot: attempting to execute
> Executing hboot on n1 (192.168.0.66 - 1 CPU)...
> lamboot: attempting to execute "/usr/bin/ssh 192.168.0.66 -n echo
> $SHELL"
> lamboot: got remote shell /bin/tcsh
> lamboot: attempting to execute "/usr/bin/ssh 192.168.0.66 -n hboot
-t
> -c lam-conf.lam -d -v -s -I "-H 192.168.0.65 -P 32807 -n 1 -o 0
""
> hboot: process schema = "/opt/lam-6.5.7/etc/lam-conf.lam"
> hboot: found /opt/lam-6.5.7/bin/lamd
> hboot: performing tkill
> hboot: tkill
> hboot: booting...
> hboot: fork /opt/lam-6.5.7/bin/lamd
> [1] 25996 lamd -H 192.168.0.65 -P 32807 -n 1 -o 0 -d
>
-----------------------------------------------------------------------

> ------
> lamboot encountered some error (see above) during the boot process,
> and will now attempt to kill all nodes that it was previously able
to
> boot (if any).
>
> Please wait for LAM to finish; if you interrupt this process, you
may
> have LAM daemons still running on remote nodes.
>
-----------------------------------------------------------------------

> ------
> wipe ...
>
> LAM 6.5.7/MPI 2 C++/ROMIO - Indiana University
>
> Executing tkill on n0 (192.168.0.65)...
> Executing tkill on n1 (192.168.0.66)...
> lamboot did NOT complete successfully
>
>
> Thanks for any suggestions,
>
> Susanne
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden] 
{+} http://www.lam-mpi.org/ 
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/