LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Eric Thibodeau (kyron_at_[hidden])
Date: 2006-08-09 11:44:48


Hello everyone,

        I am currently trying to get my MPI app to run off a somewhat heterogenous environement. Here is how I compile my apps:

1- Run the following command on all arches (head node and slave node)

        mpicc -lm -lX11 -o mandelbrot-mpi.$(laminfo -arch | cut -d' ' -f10) mandelbrot-mpi.c

This generates the following binaries:

        mandelbrot-mpi.i686-pc-linux-gnu
        mandelbrot-mpi.x86_64-pc-linux-gnu

2- Start lam-mpi on the desired nodes with lamboot:

        lamboot small_hst

Where small_hst contains:

        headless
        thinkbig1
        thinkbig21

The "headless" host is the head node (dual opteron, x86_64) and the other "thinkbig" nodes are AthlonXP nodes. Lamboot starts with no complaints

3- (Try to) use mpiexec to launch the parallel application:

        mpiexec -n 4 -arch i686-pc-linux-gnu $PWD/mandelbrot-mpi.i686-pc-linux-gnu 100 200 200 1 : -arch x86_64-pc-linux-gnu $PWD/mandelbrot-mpi.x86_64-pc-linux-gnu 100 200 200 1

The output I get is:

Use of uninitialized value in concatenation (.) or string at /usr/bin/mpiexec line 641.
Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.
Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.
Use of uninitialized value in concatenation (.) or string at /usr/bin/mpiexec line 641.
Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.
Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.
Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.
Use of uninitialized value in concatenation (.) or string at /usr/bin/mpiexec line 641.
Use of uninitialized value in pattern match (m//) at /usr/bin/mpiexec line 640.
/export/home/eric/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2/mandelbrot-mpi.i686-pc-linux-gnu: error while loading shared libraries: liblamf77mpi.so.0: cannot open shared object file: No such file or directory
/export/home/eric/1_Files/1_ETS/1_Maitrise/MGL810/Devoir2/mandelbrot-mpi.i686-pc-linux-gnu: error while loading shared libraries: liblamf77mpi.so.0: cannot open shared object file: No such file or directory
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
mpirun failed with exit status 252

Now, I noticed the library loading error and get it even though I set the following in my ~/.bashrc (which is sourced by ~/.profile, and that is the only thing that ~/.profile does):

if [ $(uname -m) == "x86_64" ]
then
        export LD_LIBRARY_PATH="/usr/lib64"
else
        export LD_LIBRARY_PATH="/usr/lib"
fi

Which seems to have no impact unless I am loging on interactively (I made sure that ~/.bashrc was not being bypassed within the script in that specific case).

Now, the questions:

1- I am not sure I am using mpiexec correctly (based my command line on the FAQ and the manpage).
2- How do I get lam-mpi to look in the correct path for the libraries. The manpage for lamboot claims that ~/.profile is sourced by deault on the local nodes but I have no way of confirming this.
3- Is setting the LD_LIBRARY_PATH the real solution to my problem or am-I missing something else?
4- This application has the first process 0 perform some display. The first process _has_ to be one running on the host named "headless", where all commands are launched. Am-I assuming that the process 0 will always be on the node first?

Thanks for the info in advance,

Eric Thibodeau

PS: I am also trying to do this with OpenMPI, if it's easyer to accomplish this under OpenMPI, please don't hesitate to inform me of this since I found no evidence that it was (I also decided not to cross-post this to the OpenMPI list for the moment)