LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Eric Swenson (eric_at_[hidden])
Date: 2005-04-22 16:31:58


NUMA support in the Suse Enterprise 9 64-bit kernel is *halfway*
decent... it certainly doesn't do the best job possible, however, but it
does at least provide a working interface to numa support.

Make sure to enable bank interleaving, but disable node interleaving in
your BIOS. If you see an ACPI NUMA srat (static resource affinity
table) option in your BIOS, I would enable that.

To see if you have NUMA enabled with your kernel, type "numastat"..

i.e., on a dual-cpu opteron node with node interleaving disabled, you
would see something like:

node08:~ # numastat
                         node1 node0
numa_hit 101025574 101451393
numa_miss 1711047 5067
numa_foreign 5067 1711047
interleave_hit 0 0
local_node 101024970 101451375
other_node 1711651 5085

You can control numa parameters for testing reasons using "numactl":
node08:~ # numactl --show
policy: default
preferred node: 0
interleavemask:
interleavenode: 0
nodebind: 0 1
membind: 0 1
node08:~ # numactl --hardware
available: 2 nodes (0-1)
node 0 size: 4095 MB
node 0 free: 2899 MB
node 1 size: 4095 MB
node 1 free: 3015 MB

Then there is "numademo", which will actually test the various memory
bandwidths on your system and report to you what they are. Typically
around 1600MB/sec to the far node and 2200MB/sec to the local node, on a
modern DDR400 dual opteron setup such as an Arima HDAMA.

Now, you may ask- what to do with this? Well you can run a serial code
for testing and nail it to one CPU using "numactl -mX
-cX ./mycode.exe" (and realize a 10% speedup) where X is 0,1, etc., but
this doesn't help for MPI code. In that case, you want to make calls to
libnuma:

 http://www.firstfloor.org/~andi/numa.html
(or just google for "libnuma").

While I haven't implemented it myself (yet), I've seen reports of 20-30%
speedups of some mpi codes on dual opteron setups when libnuma is
implemented in your code (i.e., when you use libnuma calls to manually
tweak memory allocation instead of relying on the kernel). I imagine
the benefit would be far higher for an 8-way numa machine. I can attest
that there is at least a 10% speedup to some old serial CFD code at our
company.

Good luck,
Eric

On Fri, 2005-04-22 at 21:55 +0200, milan_at_[hidden] wrote:
> >>>>> "Eugene" == Eugene BT <eugene.devilliers_at_[hidden]> writes:
>
> Eugene> have been non-local leading to the drop. As far as I know
> Eugene> though, the 2.6 smp kernel is NUMA aware, so perhaps node
> Eugene> interleaving is enabled in the BIOS. I will report back
> Eugene> once we have our own machines in any case.
>
> The one shipped with SUSE is not really working OK. Kernel 2.6 has a
> lot of possible variants and not everyone is good for your machine. We
> use Gentoo and compile kernels by ourself, alhtough they are also
> patched and different than the ones coming from Linus. Gentoo has
> very active K8 group of developers, so just upgrading the kernel
> solved all problems for us. Just after we fixed these problems I moved
> for several month to GNU's birthplace, so I can't tell you what we
> have in the BIOS.
>
> Milan Hodoscek
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/