> A trap we fell into was that our system (Linux, not Tru64) had MPI include >files for other MPI implementations tucked away here and there. We had to root out all non-LAM MPI files and "hide" them by moving them into subdirectories. It is possible that you are inadvertently picking up the wrong .h file. Did you compile MPI_C_SAMPLE.o with mpicc, or gcc (or pgcc or icc)? MPICH and LAM include files, for example, are absolutely incompatible with one another.
I thought about this problem and I installed lam in a special directory
(configure --prefix=/usr/local/mpi/lam-6.5.9). I removed all mpi files
(old MPICH version) in /usr/local and I verified with the option -showme
of the mpicc command.
I used mpicc and to show, the command mpicc used I made:
% which mpicc
> usr/local/mpi/lam-6.5.9/bin/mpicc
% mpicc -showme -c MPI_C_SAMPLE.c
>cc -I/usr/local/mpi/lam-6.5.9/include -c MPI_C_SAMPLE.c
I seemed to use the good .h files
then for the link:
% mpicc -o hello MPI_C_SAMPLE.o -showme
> cc -I/usr/local/mpi/lam-6.5.9/include -L/usr/local/mpi/lam-6.5.9/lib -o hello MPI_C_SAMPLE.o -llammpio -lpmpi -llamf77mpi -lmpi -llam -laio -lutil
without -showme I had :
ld:
Unresolved:
lam_mpi_comm_world
lam_mpi_int
I surprise to observe:
cc -c MPI_C_SAMPLE.c
cc -I/usr/local/mpi/lam-6.5.9/include -L/usr/local/mpi/lam-6.5.9/lib -o
hello MPI_C_SAMPLE.o -llammpio -lpmpi -llamf77mpi -lmpi -llam -laio
-lutil
It's OK for the link but not for execution !!!!In compilation of
MPI_C_SAMPLE.c without -I..., I used Compaqs'version .h files.
in excution:
% lamboot -v hostfile
% mpirun -np 2 hello
-----------------------------------------------------------------------------
It seems that [at least] one of processes that was started with mpirun
did not invoke MPI_INIT before quitting (it is possible that more than
one process did not invoke MPI_INIT -- mpirun was only notified of the
first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------
===========================================
We asked me:
> This doesn't look like a platform problem, actually.
I am also th feeling it's not a platform problem, but I dont't
understand...
> Did you call MPI_Init somewhere in your application?
Yes, of course
%nm MPI_C_SAMPLE.o
> Name Value Type Size
MPI_Comm_rank | 0000000000000000 | U |
0000000000000008
MPI_Comm_size | 0000000000000000 | U |
0000000000000008
MPI_Finalize | 0000000000000000 | U |
0000000000000008
MPI_Init | 0000000000000000 | U |
0000000000000008
MPI_Recv | 0000000000000000 | U |
0000000000000008
MPI_Send | 0000000000000000 | U |
0000000000000008
_fpdata | 0000000000000000 | U |
0000000000000000
lam_mpi_comm_world | 0000000000000000 | U |
0000000000000000
lam_mpi_int | 0000000000000000 | U |
0000000000000000
main | 0000000000000000 | T |
0000000000000008
printf | 0000000000000000 | U |
0000000000000008
scanf | 0000000000000000 | U |
0000000000000008
It's a simple program. This program run with Compaq and MPICHs'version
and I tried with another simple program...same link problem
#include <stdio.h>
#include "mpi.h"
int main(int argc, char *argv[])
{
MPI_Status status;
int num, rank, size, tag, next, from;
/* Start up MPI */
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
/* Arbitrarily choose 201 to be our tag. Calculate the */
/* rank of the next process in the ring. Use the modulus */
/* operator so that the last process "wraps around" to rank */
/* zero. */
tag = 201;
next = (rank + 1) % size;
from = (rank + size - 1) % size;
/* If we are the "console" process, get a integer from the */
/* user to specify how many times we want to go around the */
/* ring */
if (rank == 0) {
printf("Enter the number of times around the ring: ");
scanf("%d", &num);
printf("Process %d sending %d to %d\n", rank, num, next);
MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
}
/* Pass the message around the ring. The exit mechanism works */
/* as follows: the message (a positive integer) is passed */
/* around the ring. Each time is passes rank 0, it is decremented. */
/* When each processes receives the 0 message, it passes it on */
/* to the next process and then quits. By passing the 0 first, */
/* every process gets the 0 message and can quit normally. */
while (1) {
MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);
printf("Process %d received %d\n", rank, num);
if (rank == 0) {
num--;
printf("Process 0 decremented num\n");
}
printf("Process %d sending %d to %d\n", rank, num, next);
MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
if (num == 0) {
printf("Process %d exiting\n", rank);
break;
}
}
/* The last process does one extra send to process 0, which needs */
/* to be received before the program can exit */
if (rank == 0)
MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);
/* Quit */
MPI_Finalize();
return 0;
}
Thanks for your help
Laurence Viry
--
Laurence Viry
CRIP UJF - Projet MIRAGE (http://mirage.imag.fr )
Laboratoire de Modélisation et Calcul - IMAG
tel: 04 76 51 40 83
e-mail: Laurence.Viry_at_[hidden]
|