LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: David Ulrich (david.ulrich_at_[hidden])
Date: 2005-11-15 20:05:17


Hi Andrey,

I have experiment some bad result with HT on Xeon executing 2
identical processus on same physical CPU...
Have you try running only 16 lamd, one by real physical CPU? If not,
try it and compare the results before using 32 nodes.
Have you try to do a MPI_Barrier just before the first MPI_WTime and
just before the second MPI_WTime? You could perharps see if it's a
synchronisation problem.

On other side, I have done some test between send-receive and
MPI_gather and MPI_scatter.... using send-receive is faster (5%) for
me on a small cluster with 2 to 14 nodes P4 (2,4GHz Gigabit ethernet)
and I can't explain that....
I think the response would be the same in the two cases.

Regards

David

Le 16 nov. 05 à 00:44, Andrey Kharuk a écrit :

> Hi All,
>
> I'm quite new in Parallel Computing and now I'm trying to understatnd
> its principles. For that I'm using the trivial program to compute Pi.
> And I've got strange results: communication time for MPI_Reduce is
> longer than for MPI_Send/MPI_Recv. And another wondering aspect is the
> communication time of MPI_Reduce very unstable and in common depends
> on computation time together with amount of processors. For my
> experiments I use cluster with 8 dual processors nodes with
> hyper-thread technology (Xeon EM64T). So, as I understand, the total
> number of processors is 32 and I use all of them. I looked at archive
> and found out only one discussion which is quite close to my question:
>
> http://www.lam-mpi.org/MailArchives/lam/2004/03/7747.php
>
> Could anyone explaine me why MPI_Reduce works slower than
> MPI_Send/MPI_Recv and why communication time for MPI_Reduce so
> unstable? And is it because of each node runs 4 processes on two
> processors and hyper-threading doesn't matter here?
>
> Here is a combine results of my tests. For both variants I used the
> same environment and synchronized processes by MPI_Barrier just before
> receiving results of calculation.
>
> Number of CPUs : 32
> Number Reduce Send/Recv
> of intervals time, s time, s
>
> 1024 0.000769 0.000447
> 2048 0.000591 0.000483
> 4096 0.000746 0.000463
> 8192 0.000432 0.000432
> 16384 0.000614 0.000529
> 32768 0.000903 0.000417
> 65536 0.000707 0.000413
> 131072 0.000767 0.000447
> 262144 0.000866 0.000410
> 524288 0.000994 0.000480
> 1048576 0.000970 0.000411
> 2097152 0.000880 0.000418
> 4194304 0.000488 0.000448
> 8388608 0.009120 0.000423
> 16777216 0.000540 0.000341
> 33554432 0.000489 0.000410
> 67108864 0.000524 0.000482
> 134217728 0.001711 0.000898
>
> Thank you,
>
> Andrey
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/