LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Robin Humble (rjh_at_[hidden])
Date: 2004-10-05 21:19:41


On Tue, Oct 05, 2004 at 08:04:23PM +0200, Bogdan Costescu wrote:
>On Tue, 5 Oct 2004, Pierre Valiron wrote:
Re: ring topology:
>> which might be usable provided the network is fast (Myrinet,
>> Infiniband, etc) and N is small enough.
>This works well for applications where most (or all) of the
>communication is done between neighbours. Otherwise, to reach a node
>from another node, a message might need to pass through a few others
>which will increase latency a lot, especially if the NIC is not
>prepared to do this operation on its own and the host CPU(s) have to
>actively be involved.

We have a 2D mesh of 264 dual Xeon machines connected to 24port gigE
switches that performs well.
Each machine has 2 e1000's and is connected to an 'x' switch and a 'y'
switch. There is a worst case of one hop through a Linux router (all
nodes are do routing) to get from anywhere to anywhere on the machine.
You do take a fair hit in bandwidth (~ 30%) when routing at gigabit
speeds, and of course one additional helping of latency too.

A 2D mesh with 24 port switches could scale up to up to 576 nodes-ish,
but you'd probably want a thin/fat tree network for NFS etc., so you
lose a few ports on either 'x' or 'y' switches to this.

Typically you'd have a set of names associated with the fat tree
network, and another set of names for the fast 2D mesh network.
LAM supports this neatly with the with the 'lam-hostmap.txt' capability.

Large (few * 100 port) gigE switch often seem to have limited bandwidth
or cpu somewhere and rarely (in my limited experience) do as well as
you'd expect from the spec sheet.
A stack of 24port switches (27 + a spare in our case) are a fraction of
the price of one of these large switches.
The routing tables are a bit of a nightmare, so you have to write
programs to generate them for you. We can put an updated version of our
routing code online if anyone is interested.

cheers,
robin

--
    Robin Humble       http://www.cita.utoronto.ca/~rjh/