Hi,
I have seen that the problem of network congestion within the
MPI_Alltoall routine has been discussed earlier on this mailing
list. While trying to enhance the scaling of a molecular
dynamics application on our Gigabit Ethernet cluster, I ran
into this problem as well. In our case the congestion happened
within the switch. I therefore experimented with the
all-to-all code and have now a modified Alltoall for multi-CPU
nodes which shows no congestion even when Ethernet flow control
is turned off. I verified this on our dual-CPU cluster for up
to 32 nodes (64 CPUs). We use a HP ProCurve 2848 switch as well.
The performance of the original LAM MPI_Alltoall however remains
a bit better for small message sizes. This is similar to what
Pierre found for his modified routines.
Will a new all-to-all routine be implemented in a future version
of LAM / OpenMPI? I am willing to contribute my code as well
if there is interest.
Regards,
Carsten
---------------------------------------------------
Dr. Carsten Kutzner
Max Planck Institute for Biophysical Chemistry
Theoretical and Computational Biophysics Department
Am Fassberg 11
37077 Goettingen, Germany
Tel. +49-551-2012313, Fax: +49-551-2012302
http://www.gwdg.de/~ckutzne
|