On Nov 1, 2005, at 4:59 PM, wgl wrote:
> I have a small test code here trying to test the performance of
> MPI_Allgatherv, I was shocked by the waiting time that MPI_Allgatherv
> consumed. I'm sure something wrong with my code. Could someone help me
> to modify it? Thanks a lot!!!!
These numbers are unexpected, but I really can't give any answers as
to why (based on the information you gave). Our Allgatherv
implementation is really simple - it's basically:
for (int i = 0 ; i < nprocs ; ++i) {
MPI_Gatherv(sendbuf, sendcount, sendtype, recvbuf, recvcounts,
displs, recvtype, i, comm);
}
Which means that each process is going to get slammed with every
other proc's data, one at a time. Not exactly the best use of the
network :/. You're sending approximately .5MB of data from each
process, so there's a fairly sizable spike in network traffic all at
once.
Just out of curiosity, how are you running this code? What network,
how many processors per node, how many processes per node, etc?
Brian
> +++++++++++++++++++++++++++++++++++++++++++
> results
>
> node 0 comutime 0.000001 totaltime 13.694220
> node 1 comutime 5.649959 totaltime 95.133599
> node 2 comutime 5.464476 totaltime 95.928307
> node 3 comutime 5.464566 totaltime 96.483795
> node 4 comutime 5.540017 totaltime 97.016238
> node 5 comutime 5.466977 totaltime 97.538457
> node 6 comutime 5.618389 totaltime 98.086424
> node 7 comutime 5.490053 totaltime 98.649446
> node 8 comutime 5.541841 totaltime 99.408869
> node 9 comutime 5.466705 totaltime 99.961623
> node 10 comutime 5.466315 totaltime 100.550240
> node 11 comutime 5.515423 totaltime 101.133764
> node 12 comutime 5.516169 totaltime 101.655122
> node 13 comutime 5.480149 totaltime 101.737988
> node 14 comutime 5.386903 totaltime 101.799232
> ++++++++++++++++++++++++++++++++++++++++++
>
> code:
>
> #include <stdio.h>
> #include <mpi.h>
>
> const int Nrows = 20000;
> const int Ncolumns = 1000;
>
> int main (int argc, char **argv) {
> // int matrix[Nrows][Ncolumns];
> int myRank, Nprocessors;
>
> MPI_Status stat;
> MPI_Datatype columntype, rowtype;
>
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &myRank);
> MPI_Comm_size(MPI_COMM_WORLD, &Nprocessors);
>
>
> int counts[Nprocessors];
> int displacements[Nprocessors];
> int offsets[Nprocessors];
>
> counts[0] = 0;
> displacements[0] = 0;
> displacements[1] = 0;
>
> int bin = Nrows / (Nprocessors-1);
> int bins[Nprocessors];
>
> int *matrix = new int[Nrows*Ncolumns];
>
> for (int i=1; i<Nprocessors; i++) bins[i] = bin;
>
> if( Nrows > (Nprocessors - 1) * bin )
> for (int i=1; i<=Nrows-(Nprocessors-1)*bin; i++)
> bins[i] ++;
>
> for( int i=1; i<Nprocessors; i++ ) {
> displacements[i] = displacements[i - 1] + counts
> [i-1];
> counts[i] = bins[i] * Ncolumns;
> }
>
> MPI_Allgather( &counts[myRank], 1, MPI_INT, counts, 1,
> MPI_INT, MPI_COMM_WORLD );
>
> int my_offset = displacements[myRank];
>
> MPI_Gather( &my_offset, 1, MPI_INT, offsets, 1, MPI_INT, 0,
> MPI_COMM_WORLD );
>
> double start_time = MPI_Wtime();
> for (int i=0; i<counts[myRank]; i++) {
> matrix[my_offset + i] = myRank*1000 + i;
> for (int j=0; j<1000; j++) {}
> }
> double end_time = MPI_Wtime();
>
> double computation = end_time - start_time;
>
> // MPI_Barrier(MPI_COMM_WORLD);
> MPI_Allgatherv ( &matrix[my_offset], counts[myRank],
> MPI_INT,
> matrix, counts, displacements, MPI_INT, MPI_COMM_WORLD);
>
> // MPI_Barrier(MPI_COMM_WORLD);
>
> end_time = MPI_Wtime();
> double totaltime = end_time - start_time;
>
> printf("node %d comutime %lf totaltime %lf\n", myRank,
> computation, totaltime);
> MPI_Finalize();
> return(0);
> }
>
>
>
>
> Guoli Wang, Ph.D.
> Bioinformatics Group
> Fox Chase Cancer Center
>
> Phone: (215)-214-4261
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|