Hi All,
I'm quite new in Parallel Computing and now I'm trying to understatnd
its principles. For that I'm using the trivial program to compute Pi.
And I've got strange results: communication time for MPI_Reduce is
longer than for MPI_Send/MPI_Recv. And another wondering aspect is the
communication time of MPI_Reduce very unstable and in common depends
on computation time together with amount of processors. For my
experiments I use cluster with 8 dual processors nodes with
hyper-thread technology (Xeon EM64T). So, as I understand, the total
number of processors is 32 and I use all of them. I looked at archive
and found out only one discussion which is quite close to my question:
http://www.lam-mpi.org/MailArchives/lam/2004/03/7747.php
Could anyone explaine me why MPI_Reduce works slower than
MPI_Send/MPI_Recv and why communication time for MPI_Reduce so
unstable? And is it because of each node runs 4 processes on two
processors and hyper-threading doesn't matter here?
Here is a combine results of my tests. For both variants I used the
same environment and synchronized processes by MPI_Barrier just before
receiving results of calculation.
Number of CPUs : 32
Number Reduce Send/Recv
of intervals time, s time, s
1024 0.000769 0.000447
2048 0.000591 0.000483
4096 0.000746 0.000463
8192 0.000432 0.000432
16384 0.000614 0.000529
32768 0.000903 0.000417
65536 0.000707 0.000413
131072 0.000767 0.000447
262144 0.000866 0.000410
524288 0.000994 0.000480
1048576 0.000970 0.000411
2097152 0.000880 0.000418
4194304 0.000488 0.000448
8388608 0.009120 0.000423
16777216 0.000540 0.000341
33554432 0.000489 0.000410
67108864 0.000524 0.000482
134217728 0.001711 0.000898
Thank you,
Andrey
|