LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Laurence Viry (Laurence.Viry_at_[hidden])
Date: 2003-06-16 05:35:57


> A trap we fell into was that our system (Linux, not Tru64) had MPI include >files for other MPI implementations tucked away here and there. We had to root out all non-LAM MPI files and "hide" them by moving them into subdirectories. It is possible that you are inadvertently picking up the wrong .h file. Did you compile MPI_C_SAMPLE.o with mpicc, or gcc (or pgcc or icc)? MPICH and LAM include files, for example, are absolutely incompatible with one another.

I thought about this problem and I installed lam in a special directory
(configure --prefix=/usr/local/mpi/lam-6.5.9). I removed all mpi files
(old MPICH version) in /usr/local and I verified with the option -showme
of the mpicc command.

I used mpicc and to show, the command mpicc used I made:
% which mpicc
> usr/local/mpi/lam-6.5.9/bin/mpicc

% mpicc -showme -c MPI_C_SAMPLE.c
>cc -I/usr/local/mpi/lam-6.5.9/include -c MPI_C_SAMPLE.c

I seemed to use the good .h files

then for the link:

% mpicc -o hello MPI_C_SAMPLE.o -showme
> cc -I/usr/local/mpi/lam-6.5.9/include -L/usr/local/mpi/lam-6.5.9/lib -o hello MPI_C_SAMPLE.o -llammpio -lpmpi -llamf77mpi -lmpi -llam -laio -lutil

without -showme I had :

ld:
Unresolved:
lam_mpi_comm_world
lam_mpi_int

I surprise to observe:

cc -c MPI_C_SAMPLE.c
cc -I/usr/local/mpi/lam-6.5.9/include -L/usr/local/mpi/lam-6.5.9/lib -o
hello MPI_C_SAMPLE.o -llammpio -lpmpi -llamf77mpi -lmpi -llam -laio
-lutil

It's OK for the link but not for execution !!!!In compilation of
MPI_C_SAMPLE.c without -I..., I used Compaqs'version .h files.

in excution:

% lamboot -v hostfile
% mpirun -np 2 hello

-----------------------------------------------------------------------------
It seems that [at least] one of processes that was started with mpirun
did not invoke MPI_INIT before quitting (it is possible that more than
one process did not invoke MPI_INIT -- mpirun was only notified of the
first one, which was on node n0).

mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
-----------------------------------------------------------------------------

===========================================
We asked me:

> This doesn't look like a platform problem, actually.

I am also th feeling it's not a platform problem, but I dont't
understand...

> Did you call MPI_Init somewhere in your application?

Yes, of course

%nm MPI_C_SAMPLE.o
> Name Value Type Size

MPI_Comm_rank | 0000000000000000 | U |
0000000000000008
MPI_Comm_size | 0000000000000000 | U |
0000000000000008
MPI_Finalize | 0000000000000000 | U |
0000000000000008
MPI_Init | 0000000000000000 | U |
0000000000000008
MPI_Recv | 0000000000000000 | U |
0000000000000008
MPI_Send | 0000000000000000 | U |
0000000000000008
_fpdata | 0000000000000000 | U |
0000000000000000
lam_mpi_comm_world | 0000000000000000 | U |
0000000000000000
lam_mpi_int | 0000000000000000 | U |
0000000000000000
main | 0000000000000000 | T |
0000000000000008
printf | 0000000000000000 | U |
0000000000000008
scanf | 0000000000000000 | U |
0000000000000008

It's a simple program. This program run with Compaq and MPICHs'version
and I tried with another simple program...same link problem

#include <stdio.h>
#include "mpi.h"

int main(int argc, char *argv[])
{
  MPI_Status status;
  int num, rank, size, tag, next, from;

  /* Start up MPI */

  MPI_Init(&argc, &argv);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  MPI_Comm_size(MPI_COMM_WORLD, &size);
 
  /* Arbitrarily choose 201 to be our tag. Calculate the */
  /* rank of the next process in the ring. Use the modulus */
  /* operator so that the last process "wraps around" to rank */
  /* zero. */

  tag = 201;
  next = (rank + 1) % size;
  from = (rank + size - 1) % size;

  /* If we are the "console" process, get a integer from the */
  /* user to specify how many times we want to go around the */
  /* ring */

  if (rank == 0) {
    printf("Enter the number of times around the ring: ");
    scanf("%d", &num);

    printf("Process %d sending %d to %d\n", rank, num, next);
    MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);
  }

  /* Pass the message around the ring. The exit mechanism works */
  /* as follows: the message (a positive integer) is passed */
  /* around the ring. Each time is passes rank 0, it is decremented. */
  /* When each processes receives the 0 message, it passes it on */
  /* to the next process and then quits. By passing the 0 first, */
  /* every process gets the 0 message and can quit normally. */

  while (1) {

    MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);
    printf("Process %d received %d\n", rank, num);

    if (rank == 0) {
      num--;
      printf("Process 0 decremented num\n");
    }

    printf("Process %d sending %d to %d\n", rank, num, next);
    MPI_Send(&num, 1, MPI_INT, next, tag, MPI_COMM_WORLD);

    if (num == 0) {
      printf("Process %d exiting\n", rank);
      break;
    }
  }

  /* The last process does one extra send to process 0, which needs */
  /* to be received before the program can exit */

  if (rank == 0)
    MPI_Recv(&num, 1, MPI_INT, from, tag, MPI_COMM_WORLD, &status);

  /* Quit */

  MPI_Finalize();
  return 0;
}

Thanks for your help

Laurence Viry

-- 
Laurence Viry
CRIP UJF - Projet MIRAGE (http://mirage.imag.fr )
Laboratoire de Modélisation et Calcul - IMAG
tel: 04 76 51 40 83
e-mail: Laurence.Viry_at_[hidden]