LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jacob Vos (jacob.vos_at_[hidden])
Date: 2005-01-03 12:06:12


Hi,

Below is a small program demonstrating some spurious message data being
received. For convenience of development before I moved to the cluster,
I was using a Dual 2.5GHz G5 with a 'lamnode_file' set at 'cpu=8'. I
noticed that randomly the second receive would contain the same data as
the first receive. I can't reproduce the anomaly with 'cpu=2'.

I thought my logic may have been flawed. I however could not find the
source of my error. So, I wrote this small test program and I was able
to reproduce the anomaly.

Basically, about .005% to .01% of the send/receives are corrupt. It's
not clear in this test code, because the value of the last send happens
to be n-1. However, in my original code that I found the anomaly in, the
previous set of sends were not related. The second receive call, when
it was corrupt, always had the same value as the first.

Please confirm that this is indeed a bug, inform me of my poor use of
asynchronous communication, or indicate that using 'cpu=8' is flawed.

A typical output would be:

CPU 4 recv2: 2984 != 2983
CPU 5 recv2: 71459 != 71458
CPU 7 recv2: 122923 != 122922
CPU 6 recv2: 156124 != 156123
CPU 3 recv2: 185705 != 185704
CPU 0 recv2: 350950 != 350949
CPU 0 recv2: 356951 != 356950
CPU 4 recv2: 449649 != 449648

Thanks all,

Jake

---------------------------------------------------------

#include "mpi.h"
#include <iostream>
using namespace std;

int main(int argc, char *argv[])
{
  int numtasks, rank;

  MPI_Init(&argc,&argv);
  MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
  
  int prev = rank-1;
  int next = rank+1;
  if (rank == 0) prev = numtasks - 1;
  if (rank == (numtasks - 1)) next = 0;

  for(int n=0; n < 1000000; n++)
  {
    int send1 = n;
    int send2 = n+1;

    int recv1, recv2;
    MPI_Request reqs[4];
    MPI_Status stats[4];
    MPI_Irecv(&recv1, 1, MPI_INT, prev, 1, MPI_COMM_WORLD, &reqs[0]);
    MPI_Irecv(&recv2, 1, MPI_INT, prev, 2, MPI_COMM_WORLD, &reqs[1]);
    MPI_Isend(&send1, 1, MPI_INT, next, 1, MPI_COMM_WORLD, &reqs[2]);
    MPI_Isend(&send2, 1, MPI_INT, next, 2, MPI_COMM_WORLD, &reqs[3]);
    MPI_Waitall(4, reqs, stats);

    if(send1 != recv1) cout << "CPU " << rank << " recv1: " << send1 <<
" != " << recv1 << endl;
    if(send2 != recv2) cout << "CPU " << rank << " recv2: " << send2 <<
" != " << recv2 << endl;
  }

  MPI_Finalize();
}