On Fri, 2 Feb 2007, Ramon Diaz-Uriarte wrote:
> On Friday 02 February 2007 14:36, Tim Prince wrote:
>> jsquyres_at_[hidden] wrote:
>>> On Feb 2, 2007, at 6:02 AM, Davide Cesari wrote:
>>>> If the second holds, then your request is pointless, in my knowledge,
>>>> LAM does not make anything particular to attach processes to CPUs in a
>>>> SMP system, it just starts as many processes as requested, then it
>>>> is up
>>>> to the operating system to balance them among the available
>>>> processors,
>>>
>>> This is correct. LAM simply starts up the Right number of processes
>>> and does not bind them to any particular CPUs.
>>>
>>>> this is the essence of Symmetric Multi Processing; AFAIK, there is no
>>>> such a concept (and no need too) of starting a process on a particular
>>>> CPU in a plain SMP system.
>>>>
>>>> If you are using the Linux kernel, then recent versions should have a
>>>> tunable scheduler which tries to attach processes to CPUs as much as
>>>> possible (the so-called CPU affinity) to improve performance on
>>>> SMP, but
>>>> it is not guaranteed either that a given process will always run on
>>>> the
>>>> same CPU.
>>>> If you have a NUMA (Non Uniform Memory Access) system, then things
>>>> are
>>>> more complex, but I have no direct experience of that.
>>
>> Most recent linux versions include a useful taskset command:
>> mpirun -np 8 taskset -c 8-15 ./a.out
>> which should be fairly effective at placing your processes on that group
>> of processors within each node. The purpose of using taskset usually is
>> to improve efficiency through cache or NUMA memory affinity, but it
>> could be used to do what OP appears to be requesting.
>
> Sorry, I must be missing something, but shouldn't this be something the OS
> does? I think I recall that last time I recompiled a Linux kernel (a 2.6 one,
> for AMD Opteron machine, about 6 months ago?) there was stuff related to
> NUMA. I'd feel better if someone doing kernel development takes care of this
> rather than having this responsibility myself :-).
You're only missing that Computers Suck (IMHO) :). There is an awful lot
of code in the Linux kernel to try to make NUMA machines more tolerable.
But it has it's limitations -- it's designed to provide the best overall
machine "responsiveness", not the lowest latency for 2 of the 80 processes
running on the machine. MPI apps tend to want the second one -- a very
few processes should be privledged over all others.
Brian
|