LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Robin Humble (rjh_at_[hidden])
Date: 2003-06-04 23:16:04


On Wed, Jun 04, 2003 at 11:30:10PM -0400, Peter McLachlan wrote:
>nodes at a time. I'd love to see a meshed implementation that allowed you
>to test bandwidth simultaneously between N nodes. (I know, I know, I
>should stop complaining and write one :P)

the PMB benchmark does this.
  http://www.pallas.com/e/products/pmb/

>running is the Linpack cluster benchmark. (HPL == High Performance
>Linpack I believe). It should scale well I would think. The latency

it scales fairly linearly.

>numbers seem to be fine from netPIPE, latency for a 131kb message for
>instance is 0.001512s which seems reasonable.

There are a million tunables with the HPL code you probably have to try
many of them before you get a good set for your particular machine.
The HPL tuning guide will help. The first step is to choose a problem
size that uses (almost) all the memory.

Also if you are trying for top500.org rather than just for fun, then
you should be using Intel Blas/MKL as it's much faster than ATLAS.
There are other tweaks out there, but I'll leave you to find those :)
Turning off hyperthreading (boot with 'noht') will also probably help.

Are you running with something like 'mpirun C ...'?
Locality is important for most parallel codes, and the C option puts
neighbouring processes on the same node.
Also the -O option is important for lam-6.5.9, but lam-7 auto-detects
this I believe.

cheers,
robin