Hi Vishal,
This error has gone away, I guess the admin made some changes to the h/w
etc. so that must be it, but now I get this libcr error:
lamboot: error while loading shared libraries: libcr.so.0: cannot open
shared ob
ject file: No such file or directory
lamhalt: error while loading shared libraries: libcr.so.0: cannot open
shared ob
ject file: No such file or directory
>From the archives it's clear that this is more of a unix issue, but how I
can find where this BLCR library is installed on my system? (i.e. which
paths should I look under?)
This is o/p from laminfo:
Redstone[61] pushkar$ laminfo
LAM/MPI: 7.0.5
Prefix: /usr/local/mpi/LINUX/lam7
Architecture: i686-pc-linux-gnu
Configured by: morgan
Configured on: Fri May 21 12:59:45 CDT 2004
Configure host: Redstone.ERC.MsState.Edu
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (Module v0.5)
SSI boot: rsh (Module v1.0)
SSI coll: lam_basic (Module v7.0)
SSI coll: smp (Module v1.0)
SSI rpi: crtcp (Module v1.0.1)
SSI rpi: lamd (Module v7.0)
SSI rpi: sysv (Module v7.0)
SSI rpi: tcp (Module v7.0)
SSI rpi: usysv (Module v7.0)
Thanks,
Pushkar
> -----Original Message-----
> From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]]On Behalf
> Of Vishal Sahay
> Sent: Friday, May 28, 2004 10:46 PM
> To: General LAM/MPI mailing list
> Subject: Re: LAM: unable to boot
>
>
> It seems that LAM has been installed with enable-shared option. You dont
> seem to have your LD_LIBRARY_PATH set properly on all the nodes.
> Specifically you should have it pointing to your prefix/lib (where prefix
> is the dir where LAM is installed) on all the nodes. Best way would be to
> have it in your dot files.
>
> Hope this helps...
>
> -Vishal
>
>
> On Thu, 27 May 2004, Pushkar Pradhan wrote:
>
> # I'm unable to boot my nodes with lam 7.0.5 (recently
> installed). This is my
> # boot command in the PBS script:
> # lamboot -v -ssi boot rsh -ssi rsh_agent "rsh" $PBS_NODEFILE
> #
> # And below are the errors in the error file, the output file
> doesn't contain
> # the message "topology done" which I guess is printed if it's successful.
> #
> # n-1<1718> ssi:boot:base:linear: booting n0 (Empire-09-14)
> # n-1<1718> ssi:boot:base:linear: booting n1 (Empire-09-02)
> # ERROR: LAM/MPI unexpectedly received the following on stderr:
> # hboot: error while loading shared libraries: liblam.so.0:
> cannot open shared
> # object file: No suc
> # h file or directory
> #
> ------------------------------------------------------------------
> ----------
> # -
> # LAM attempted to execute a process on the remote node "Empire-09-02",
> # but received some output on the standard error.
> #
> # LAM tried to use the remote agent command "rsh"
> # to invoke "hboot" on the remote node.
> #
> # This can indicate an authentication error with the remote agent, or
> # can indicate an error in your $HOME/.cshrc, $HOME/.login, or
> # $HOME/.profile files. The following is a list of items that you may
> # wish to check on the remote node:
> # .......
> # .......
> #
> # I tried pasting the rsh command and this is the result:
> # Redstone[1153] pushkar$ rsh Empire-09-02 -n hboot -t -c
> # lam-conf.lamd -v -sessionsuffix pbs-59687.Empire -s -I "-H
> 172.16.9.14 -P
> # 32837 -n 1 -o 0"
> # poll: protocol failure in circuit setup
> #
> # I made sure all the libs and binaries are in my path.
> # Can anyone tell what's wrong? Thanks,
> # Pushkar
> #
> # _______________________________________________
> # This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> #
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|