You might want to check your "dot" files (similar to what is shown below,
or perhaps just "ssh othernode env | grep LD_") and ensure that your
LD_LIBRARY_PATH is set properly on the remote nodes. Given the ordering
and selective execution of dot files with remote shells, it may not be set
to what you think it should be.
Also, editing /etc/ld.so.conf is not enough -- you must also run ldconfig
to make those changes take effect.
Sidenote: there are some sysadmins who believe that putting commonly-used
shared libraries on network-mounted filesystems, and then making them
referred to by ld.so.conf is a Bad Idea. If the network file server goes
down, the potential for disaster is large. These types of people
generally prefer to physically install on all nodes. These type of people
would also tend to not like the name "/usr/local" as a mount point for a
network drive (it's a contradiction in terms :-). I don't advocate either
method -- I just mention these things to make you aware of the issues.
On Tue, 15 Jul 2003, Jim Procter wrote:
> No idea if this will help, but :
>
> On Tuesday 15 July 2003 16:39, Michael Lees wrote:
> <snip>
> > However I *don't* get the error if step 1.) becomes 'lamboot' or step
> > 2.) becomes 'mpicc hello.c'
> >
> > ps. another note...
> > I also tried adding /usr/local/lib to /etc/ld.so.conf but still no joy?
> >
>
> Step 2 makes sense - you don't link in the missing library. Step 1 suggests
> that your schema is doing something funny to the library path...
>
> Try this :
>
> lamboot mynodes
> lamexec N env | grep LD_LIBR
>
> So you should see the same entry for LD_LIBRARY_PATH, everywhere. You could
> also do 'ldd' on the executable to see if the dependencies are all being
> resolved as the execution goes through the LAM demon.
>
> j.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|