[ Sorry for coming so late in the thread...]
On Tue, 16 Aug 2005, Pierre Valiron wrote:
> However I have performed some experiments with Fortran code after
> enabling hardware flow control on the gigabit interfaces.
If enabling hardware flow control improves performance, the switch
that you are using might be the bottleneck in that it might not be
able to cope with the simultaneous transfers from all (or most of) its
ports that result from all-to-all communication. This is typical for a
switch with a backbone bandwidth lower than the sum of bandwidths of
all the ports.
> I can understand why this contention is less severe for small
> buffers, which may fit in IP and TCP stacks
On Linux, there are system wide variables that allow setting the
buffer dimensions for TCP (/proc/sys/net/ipv4/tcp_*mem) - maybe you
can find something similar for Solaris, easier now that the source is
available...
> In order to limit the concurrency per interface, the buffers should be
> exchanged in an orderly fashion, with a single buffer being read and
> written at a time through a given interface.
This reduces indeed the network contention, but probaby increases
latency due to the increased number of context switches and the
waiting that is done in userspace. When you just push all your data to
the kernel, transmissions can be optimized in kernel, with higher
timing precision and with less context changes.
I would like to ask for another data point: can you try using your
all-to-all algorithm, but disable the hardware flow control ? Based on
the theory at least, due to the orderd pairwise communications, the
switch should be less likely to saturate now and the hardware flow
control should not make that much difference (if the switch is indeed
the bottleneck).
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]
|