LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-01-13 02:08:17


On Wed, 7 Jan 2004, Robert Piontek wrote:

> I was recently able to measure network bandwith using NetPIPE at about
> 90 Mbps (100 Mbps network cards) with both the raw TCP and MPI modules.
> When I do some performance tests myself, with the code I'll be using,
> the numbers come out about a factor of 10 smaller. I'm wondering if
> there are any suggestions as to where to start looking for the problem.
>
> We are running Linux Redhat 9, using LAM 7. My code is written in
> fortran 77, compiled with the Portland Group compiler. I've been
> measuring the performance with a profiling tool called Vampir. The code
> I run for the test is very simple, basically consists of starting up MPI
> and just doing a send_rec, then exiting. With Vampir I can open up the
> trace file and see how long it took to perform the send_rec, and also
> the size of the message. For large message sizes of a 100k or so I'm
> only seeing 10Mbps or so. I don't think Vampir adds much overhead to
> the timing, as it didn't seem to have much effect in full scale tests of
> the code.

Netpipe is actually quite sophisticated in how it measures the bandwidth
-- it does lots of repitions, sometimes pre-posts receives, making the
inner loop as small as possible, attempting to overlap communication and
computation, etc.

The business of benchmarking is quite difficult; there are many pitfalls
and common problems that arise when trying to make seemingly simple
measurements.

I don't actually have a golden bullet solution that will help you -- my
best suggestion is to actually look at the netpipe code in detail (can you
run that through Vampir?) and see all the tricks that they use.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/