[ Sorry for jumping so late in this thread. I just came back from
holiday... ]
On Tue, 5 Oct 2004, Davide Cesari wrote:
> I recently tested LAM-MPI on a double-network cluster (Sueprmicro
> 2-Xeon boards with 2 e1000 NICs and 2 Gigabit switches), but I did
> not get a bandwidth improvement from the second network
If the 2 NICs share the same PCI bus, it is hard to get high
performance from both of them at the same time. Furthermore there is
the issue of interrupt affinity and packet reordering with
TCP+bonding+SMP that I mentioned already on this list which is
probably the limit on the receive side.
What bonding mode have you used ? I think that only balance-rr (round
robin) will give increased speed.
> 2. Using differemt communication paths between different nodes I got a
> slight improvement in ttcp bandwidth (node A communicating
> simultaneously to nodes B and C using the 2 different networks gave an
> overall bandwidth a little bit higher than just A-B comunication, but
> far from double);
This might be where setting the interrupt affinity (each NIC's IRQ
assigned to a different CPU) and bonding mode (perhaps balance-xor
while making sure that the MAC addresses of the 2 other computers
modulo slave count (=2 for the 2 NICs) gives different numbers) might
help getting better results.
> because a previous test with LAM on a double 100Mbit/s network
> (whose results are in a mailing list message cited by Mark) were
> positive.
Heh, that's a completely different story. 2*100Mbit/s is much less
than PCI bandwidth limit, plus the speed of data coming in is much
lower (w.r.t. TCP+bonding+SMP problem mentioned above).
--
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]
|