LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: jess michelsen (jam_at_[hidden])
Date: 2004-05-27 05:57:47


Hi everyone!

We have for some time been running a 210 node cluster based on 2.4 GHz
Pentium and gigabit networking, running large-scale CFD calculations and
employing LAM-MPI for communications. With great succes, I should say.

We are now in the process of designing another cluster with 200 CPU's.
We are considering several alternatives, for instance 2-processor
Opteron 248 based nodes. The compute-power of these nodes is about 3
times the one seen on the 'old' cluster. If communications were
restricted to one gigabit NIC per node, this would be a severe
bottleneck.

Hence, I'm considering channel bonding (some call it link aggregation).
As far as I have understood, this can be done in two ways. In both
cases, the (two) NIC's will share the same IP number. They are either
connected to two completely separate networks, or they are trunked.

Which method would give best performance?

Presumably, we will use one of the Fedora cores.

Do anyone have comments on the performance to be expected (using LAM)?
One would hope to see the same latency and doubled bandwidth, compared
to using on NIC. Is there any chance we will end up anywhere near this
performance? Suppose the two procs on one node are to communicate to two
procs on different nodes: should they do this one after the other, or
both at the same time, or does it not matter?

Best regards, Jess Michelsen