LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bogdan Costescu (bogdan.costescu_at_[hidden])
Date: 2004-06-25 08:41:08


On Thu, 24 Jun 2004, Luke Palmer wrote:

> Would there be any advantage to starting four lam processes per
> node, rather than two, or is this just silly?

Apart from the processing units on CPU that others have mention
before, there are some other potential "bottlenecks" that have to be
shared between the 2 threads on the same CPU: cache and main memory
bus.

For things like ATLAS which are tuned to a certain cache size, putting
2 processes/threads per CPU means halving the cache size available to
each of them, which means that the cache usage optimization is thrown
away and might even negatively influence the performance due to cache
trashing.

If the 2 processes (on per CPU) are already filling the whole main
memory bus (Xeons have a shared memory bus), putting an extra
process/thread per CPU will increase the memory bus pressure. In case
the memory is not able to deliver/take data to/from the CPUs, the CPU
processing units will be idle and the performance will decrease.

The HT-aware scheduling (even recent 2.4 kernels as opposed to 2.6
only which was suggested before) takes decisions about moving
processes/threads between logical CPUs every few seconds. This is a
problem if the processes/threads have a life-cycle of the same order
of magnitude, but quite a lot of scientific applications run for
hours-days in which case a few seconds of CPU time lost because of the
wrong scheduling decision are not so bad.

-- 
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]