One other thing to throw in the mix from the other good answers already
provided:
If you're going to go with multiple NICs, you might want to get them on
different PCI busses. Specifically, if your apps want to pump out a lot
of data across the network (assumedly via MPI), you really want to be able
to stream as much data to each NIC as possible. Having them on separate
busses won't eliminate all contention, but it will help a lot.
It comes down to what kind of apps you have, and what their communication
patterns will be. For example, will you be sending little messages to
lots of peer processes, or perhaps sending a lot of data to a small number
of peers? Or, if you are going to be running a wide variety of apps with
different communication patterns, then you just have to plan for the worst
(which is ususally the most expensive ;-) ).
On Thu, 27 May 2004, jess michelsen wrote:
> Hi everyone!
>
> We have for some time been running a 210 node cluster based on 2.4 GHz
> Pentium and gigabit networking, running large-scale CFD calculations and
> employing LAM-MPI for communications. With great succes, I should say.
>
> We are now in the process of designing another cluster with 200 CPU's.
> We are considering several alternatives, for instance 2-processor
> Opteron 248 based nodes. The compute-power of these nodes is about 3
> times the one seen on the 'old' cluster. If communications were
> restricted to one gigabit NIC per node, this would be a severe
> bottleneck.
>
> Hence, I'm considering channel bonding (some call it link aggregation).
> As far as I have understood, this can be done in two ways. In both
> cases, the (two) NIC's will share the same IP number. They are either
> connected to two completely separate networks, or they are trunked.
>
> Which method would give best performance?
>
> Presumably, we will use one of the Fedora cores.
>
> Do anyone have comments on the performance to be expected (using LAM)?
> One would hope to see the same latency and doubled bandwidth, compared
> to using on NIC. Is there any chance we will end up anywhere near this
> performance? Suppose the two procs on one node are to communicate to two
> procs on different nodes: should they do this one after the other, or
> both at the same time, or does it not matter?
>
> Best regards, Jess Michelsen
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|