LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-03-19 08:06:20


It looks like you are unintentionally mixing LAM/MPI and MPICH -- two
entirely different MPI implementations. This will not work.

This list is in support of LAM/MPI. You'll need to contact the MPICH
maintainers directly for support of their product.

The error messages "One of the processes started by mpirun..." is a LAM
error message. So it's likely that /usr/local/bin/mpirun is LAM's
mpirun. But anything with "mpich" in the command name or path is
likely to be an MPICH command (your mail is a bit confusing because you
cite both /usr/local/bin/mpicc and /usr/local/mpich/bin/mpicc).

It sounds like there may be confusion on which command belongs to which
suite, and which commands you should be using. Specifically: if you
compile with LAM/MPI, you need to run with LAM/MPI. If you compile
with MPICH, you need to run with MPICH (both implementations can
certainly peacefully co-exist on the same cluster -- it's usually a
matter of simply setting your $PATH in your shell startup files to
point to the one that you want to use).

If you didn't install this stuff, you might want to talk to the local
sysadmin that did to get the details. The first thing you need to do
is get all this stuff straight and get simple MPI executables running.
Then worry about MPE.

Note that MPE is a tool that comes with the MPICH distribution, but it
is independent of which MPI you are using. So you can certainly use it
with LAM/MPI.

On Mar 19, 2005, at 7:19 AM, Dominik Baenninger wrote:

> Dear LAM list
>
> I would like check and test the performance of my cluster. Therfore I
> would
> like to make use of MPE, but I could not get a binary which is working
> properly. I guess that I did some wrong installation of mpich but I
> could not
> figure out what I was doing wrong. Here the results I get:
>
> I used the cpi.c code which comes along with mpich. I compiled this
> code first
> with
>
> mpicc -c checkperf.c
> mpicc -o _checkperf checkperf.o -L/usr/local/mpich/lib -llmpe -lmpe
> -lm
>
> The command "which mpicc" tells me, that mpicc is /usr/local/bin/mpicc.
> When I run this program on a single machine I recieve the message:
> -----------------------------------------------------------------------
> ------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 3497 failed on node n0 (192.1.1.2) due to signal 11.
> -----------------------------------------------------------------------
> ------
>
> But when I compile the programm with
>
> /usr/local/mpich/bin/mpicc -c checkperf.c
> /usr/local/mpich/bin/mpicc -o _checkperf checkperf.o
> -L/usr/local/mpich/lib
> -llmpe -lmpe -lm
>
> i.e. the mpicc command is located at an other place, the program run
> gives:
>
> Process 0 on shaw.music.home
> pi is approximately 3.1416009869231254, Error is 0.0000083333333323
> wall clock time = 0.000700
> Writing logfile....
> Finished writing logfile.
> -----------------------------------------------------------------------
> ------
> It seems that [at least] one of the processes that was started with
> mpirun did not invoke MPI_INIT before quitting (it is possible that
> more than one process did not invoke MPI_INIT -- mpirun was only
> notified of the first one, which was on node n0).
>
> mpirun can *only* be used with MPI programs (i.e., programs that
> invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
> to run non-MPI programs over the lambooted nodes.
> -----------------------------------------------------------------------
> ------
>
> I installed mpich-1.2.6 in the following way:
>
> cd /usr/local
> gzip -d mpchi-1.2.6.tar.gz
> tar -xvf mpchi-1.2.6.tar
> cd mpchi-1.2.6
> ./configure --prefix=/usr/local/mpich
> make
> make install
>
> Does anyone know, what I am doing wrong?
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/