LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-07-12 08:51:27


Which scheduler are you using?
 
If you're not using the rsh/ssh starter for LAM, at least some of these
issues go away. For example, if you're using SLURM or Torque, your
local environment will be replicated on the remote nodes (i.e., no need
to tweak shell startup files such as .bashrc, $HOME/.ssh/environment,
etc.).
 
FWIW, we did this a bit better in Open MPI. If you do the following:
 
    /path/to/bin/mpirun ...
or
    mpirun --prefix /path/to ...
 
Then OMPI will magically set your PATH/LD_LIBRARY_PATH on the remote
node (if it needs to).

________________________________

        From: lam-bounces_at_[hidden] [mailto:lam-bounces_at_[hidden]]
On Behalf Of Dennis van Dok
        Sent: Monday, July 10, 2006 10:52 AM
        To: General LAM/MPI mailing list
        Subject: Re: LAM: LD_LIBRARY_PATH for lamd
        
        
        Bogdan Costescu wrote:

                On Mon, 10 Jul 2006, Dennis van Dok wrote:
                
                  

                        Is it possible to have LD_LIBRARY_PATH set up
for each instance of
                        lamd on my nodes? That way, if a process is
started by lamd, and by
                        inheritence of the environment,
                            

                
                You have the right idea, but you forget some details :-)
                  

                [...]
                
                However, setting it up for lamd requires the environment
to be set
                into one of the shell start-up file, like the ones you
mentioned. This
                is because a remote lamd is started[1] via a simple
remote shell;
                there is no mechanism to set environment variables on
the remote nodes
                before running lamd, so they have to be set in the shell
start-up
                file(s).
                  

        OK, point taken. The issue I'm really having is this: I have a
cluster supporting more than one version of MPI. Ideally, each version
lives in its own space, so I have /opt/lam-6.x.x, /opt/lam-7.x.x,
/opt/mpich-x.x, etc. Now if a scheduler likes to run a six-node
lam-7.1.2 job, it would issue /opt/lam-7.1.2/bin/lamboot with the right
parameters, and on each of the six nodes /opt/lam-7.1.2/bin/lamd would
be started (through ssh, I suppose). Now lamd is linked with
/opt/lam-7.1.2/lib/liblam.so.0, which it can find on merit of its RPATH,
but ideally it should think: hey, I was compiled with
--prefix=/opt/lam-7.1.2, let's add /opt/lam-7.1.2/lib to my
LD_LIBRARY_PATH environment!
        
        Your dirty trick comes a long way, I suppose, and I may use that
right now (thank you!) but a cleaner solution would be to patch lamd.
You can see why the /etc/profile.d/ scripts are right out.
        
        Thank you,
        
        Dennis van Dok
        
        --
        D.H. van Dok :: Software Engineer :: www.nikhef.nl ::
www.vl-e.nl