LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Pierre Valiron (Pierre.Valiron_at_[hidden])
Date: 2004-10-05 12:11:44


Hi,

I am not surprised of poor performance improvements with 2 e1000 NICs.

You already need a very good harware to feed 1 e1000 at full speed under
TCP-IP (the theoretical limit is 120 MB/s, the usual limit is about 80-90
MB/s with close to 100% CPU utilization on most platforms for real
applications).

Bonding multiple NICs should be more effective if they eat less CPU
cycles. Has anybody some experience bonding Myrinet, Infiniband, etc,
between powerful SMP 64-bit machines (quad Opterons or Power5) ?

Another application of multi-NICs would be the support for cheaper
topologies. For instance, it may prove very expensive to buy a flat switch
over 64 nodes. Supporting multiple NICs might allow to interconnect
several switches via the nodes themselves, or several nodes to a single
switch port in a chained fashion, or even to build small rings with NO
switches.

For instance, assuming all nodes possess 2 NICs, one might create a
cheap ring connecting

        node 1 to nodes N and 2
        node i to nodes i-1 and i+1
        node N to nodes N-1 and 1

which might be usable provided the network is fast (Myrinet, Infiniband,
etc) and N is small enough. This is the topology used by the SGI Altix 350
(involving a maximum of 8 nodes) with excellent performances.

I doubt this type of configuration is supported in present LAM/MPI.

What is the future with Open-MPI ?

Best.
Pierre.

On Tue, 5 Oct 2004, Davide Cesari wrote:

> Just to share my experience on this subject: I recently tested LAM-MPI
> on a double-network cluster (Sueprmicro 2-Xeon boards with 2 e1000 NICs
> and 2 Gigabit switches), but I did not get a bandwidth improvement from
> the second network, more in detail:
>
> 1. Using channel bonding I did not get any improvement in bandwith even
> with ttcp, so I didn't bother testing MPI.
>
> 2. Using differemt communication paths between different nodes I got a
> slight improvement in ttcp bandwidth (node A communicating
> simultaneously to nodes B and C using the 2 different networks gave an
> overall bandwidth a little bit higher than just A-B comunication, but
> far from double); unfortunately LAM-MPI in this case (with lamboot -l
> and all setup according to Tim Mattox suggestions) didn't give any
> performance improvement in a send/receive test, although I verified that
> both networks were being used.
>
> I cannot swear that I did everything right, but if it is the case, I
> feel that the bottleneck that prevents the bandwidth increase is in the
> hardware (or in the kernel), because a previous test with LAM on a
> double 100Mbit/s network (whose results are in a mailing list message
> cited by Mark) were positive.
> Bye , Davide
>
>

-- 
Soutenez le mouvement SAUVONS LA RECHERCHE :  
au national :  http://recherche-en-danger.apinc.org/
sur Grenoble : http://recherchegrenoble.free.fr et http://etatsg.free.fr
       _/_/_/_/    _/       _/       Dr. Pierre VALIRON
      _/     _/   _/      _/   Laboratoire d'Astrophysique (UMR 5571 CNRS)
     _/     _/   _/     _/    Observatoire de Grenoble / U. Joseph Fourier
    _/_/_/_/    _/    _/         BP 53  F-38041 Grenoble Cedex 9 (France)
   _/          _/   _/                                   
  _/          _/  _/        http://www-laog.obs.ujf-grenoble.fr
 _/          _/ _/       mailto:Pierre.Valiron_at_[hidden]
_/          _/_/      Phone / Fax: +33 (0)4 76.51.47.87 / (0)4 76.44.88.21