On Fri, May 16, 2003 at 06:01:56AM +0000, Gareth Pearce wrote:
>We have been benchmarking performance to ensure that we have everything
>setup corrrectly and have found some rather peculiar results.
>1. 1 xeon - 1 processor - 23 seconds a step
>2. 2 xeons - 1 processor each - 19seconds a step
>3. 1 xeon - 2 processors - 19seconds a step
>4. 3 xeons - 2 processors each - 13seconds a step
>The fact that 2 and 3 are the same is worrying. lam-mpi is version 6.5.9
>and was configured with rpi=usysv - ifc/icc was the base compiler. Our
>copy of CPMD is linked against intel math libraries. It was also compiled
>with ifc.
Are you using any of the multi-threaded routines in the Intel math
libraries? these could be using both cpus without you being aware of it
as they do SMP by default. Try setting OMP_NUM_THREADS=1 (I think that's
the name of the env variable). If you are inadvertantly doing MPI +
OpenMP then you may well have more threads running than you have cpus
which would likely be slow.
Do you have hyperthreading turned on for the Xeons?
All Linux versions that I know of aren't ht-aware enough to schedule
code properly - this means eg. 2 threads on a dual Xeon system can both
be put onto the same physical cpu and overall you code runs slower...
Aditionally, _really_ good code like that in Intel math libs or
trivially simple vectorised code may be slowed down by hyperthreading as
you are already maxing out the fp on the chip without hammering it with
an additional thread.
Try booting with 'noht' or turning off ht in the bios.
p4s in general (including p4 Xeons) also have broken irq distribution
which means all irqs arrive onto cpu0 unless you have patched the
kernel with an irq blanacer. If you have lots of irqs arriving then this
could be an issue... cat /proc/interrupts to check.
The pipeline depth of Xeon is also _huge_ which slows down lots of
simple operations (eg. gettimeofday, routing, ...). Athlon has a
much shorter pipeline. LAM seems to call gettimeofday a lot.
Any one of these could explain what you are seeing...
As a first step I'd try LAM with tcp, sysv, usysv (lam-7 makes this
easy :-)
cheers,
robin
|