Hi Jake,
I cannot confirm your findings: I tried your code with my setup and the
program terminated without any fault detected. Maybe you should supply
more details about your setup to locate the source of the problem.
Below my configuration is given - I hope this helps.
Chers,
Michael
___
Hardware: IBM X31 Laptop 1.5 MHz Intel Pentium M
OS: Ubuntulinux/Debian
LAM: Installed from Debian package, 7.0.6, i386-pc-linux-gnu
I added this line to have at least some output:
cout << "rank " << rank << " out of " << numtasks << " instances." <<
endl;
mig_at_ubuntu:~/lamtest $ mpic++ test.cpp
mig_at_ubuntu:~/lamtest $ cat hostfile
localhost cpu=8
mig_at_ubuntu:~/lamtest $ mpirun -c 8 a.out
rank 0 out of 8 instances.
rank 2 out of 8 instances.
rank 1 out of 8 instances.
rank 4 out of 8 instances.
rank 3 out of 8 instances.
rank 7 out of 8 instances.
rank 5 out of 8 instances.
rank 6 out of 8 instances.
mig_at_ubuntu:~/lamtest $
mig_at_ubuntu:~/lamtest $ uname -a
Linux ubuntu 2.6.8.1 #1 Mon Nov 29 16:56:41 CET 2004 i686 GNU/Linux
Am Montag, den 03.01.2005, 12:06 -0500 schrieb Jacob Vos:
> Hi,
>
> Below is a small program demonstrating some spurious message data being
> received. For convenience of development before I moved to the cluster,
> I was using a Dual 2.5GHz G5 with a 'lamnode_file' set at 'cpu=8'. I
> noticed that randomly the second receive would contain the same data as
> the first receive. I can't reproduce the anomaly with 'cpu=2'.
>
> I thought my logic may have been flawed. I however could not find the
> source of my error. So, I wrote this small test program and I was able
> to reproduce the anomaly.
>
> Basically, about .005% to .01% of the send/receives are corrupt. It's
> not clear in this test code, because the value of the last send happens
> to be n-1. However, in my original code that I found the anomaly in, the
> previous set of sends were not related. The second receive call, when
> it was corrupt, always had the same value as the first.
>
> Please confirm that this is indeed a bug, inform me of my poor use of
> asynchronous communication, or indicate that using 'cpu=8' is flawed.
>
> A typical output would be:
>
> CPU 4 recv2: 2984 != 2983
> CPU 5 recv2: 71459 != 71458
> CPU 7 recv2: 122923 != 122922
> CPU 6 recv2: 156124 != 156123
> CPU 3 recv2: 185705 != 185704
> CPU 0 recv2: 350950 != 350949
> CPU 0 recv2: 356951 != 356950
> CPU 4 recv2: 449649 != 449648
>
> Thanks all,
>
> Jake
>
> ---------------------------------------------------------
>
> #include "mpi.h"
> #include <iostream>
> using namespace std;
>
> int main(int argc, char *argv[])
> {
> int numtasks, rank;
>
> MPI_Init(&argc,&argv);
> MPI_Comm_size(MPI_COMM_WORLD, &numtasks);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>
> int prev = rank-1;
> int next = rank+1;
> if (rank == 0) prev = numtasks - 1;
> if (rank == (numtasks - 1)) next = 0;
>
> for(int n=0; n < 1000000; n++)
> {
> int send1 = n;
> int send2 = n+1;
>
> int recv1, recv2;
> MPI_Request reqs[4];
> MPI_Status stats[4];
> MPI_Irecv(&recv1, 1, MPI_INT, prev, 1, MPI_COMM_WORLD, &reqs[0]);
> MPI_Irecv(&recv2, 1, MPI_INT, prev, 2, MPI_COMM_WORLD, &reqs[1]);
> MPI_Isend(&send1, 1, MPI_INT, next, 1, MPI_COMM_WORLD, &reqs[2]);
> MPI_Isend(&send2, 1, MPI_INT, next, 2, MPI_COMM_WORLD, &reqs[3]);
> MPI_Waitall(4, reqs, stats);
>
> if(send1 != recv1) cout << "CPU " << rank << " recv1: " << send1 <<
> " != " << recv1 << endl;
> if(send2 != recv2) cout << "CPU " << rank << " recv2: " << send2 <<
> " != " << recv2 << endl;
> }
>
> MPI_Finalize();
> }
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|