LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Ramon Diaz-Uriarte (rdiaz02_at_[hidden])
Date: 2007-02-05 16:07:22


On 2/5/07, Brian W. Barrett <brbarret_at_[hidden]> wrote:
> On Mon, 5 Feb 2007, Ramon Diaz-Uriarte wrote:
>
> > On 2/5/07, Brian W. Barrett <brbarret_at_[hidden]> wrote:
> >> On Fri, 2 Feb 2007, Ramon Diaz-Uriarte wrote:
> >>
> >>> On Friday 02 February 2007 14:36, Tim Prince wrote:
> >>>> jsquyres_at_[hidden] wrote:
> >>>>> On Feb 2, 2007, at 6:02 AM, Davide Cesari wrote:
> >>>>>> If the second holds, then your request is pointless, in my knowledge,
> >>>>>> LAM does not make anything particular to attach processes to CPUs in a
> >>>>>> SMP system, it just starts as many processes as requested, then it
> >>>>>> is up
> >>>>>> to the operating system to balance them among the available
> >>>>>> processors,
> >>>>>
> >>>>> This is correct. LAM simply starts up the Right number of processes
> >>>>> and does not bind them to any particular CPUs.
> >>>>>
> >>>>>> this is the essence of Symmetric Multi Processing; AFAIK, there is no
> >>>>>> such a concept (and no need too) of starting a process on a particular
> >>>>>> CPU in a plain SMP system.
> >>>>>>
> >>>>>> If you are using the Linux kernel, then recent versions should have a
> >>>>>> tunable scheduler which tries to attach processes to CPUs as much as
> >>>>>> possible (the so-called CPU affinity) to improve performance on
> >>>>>> SMP, but
> >>>>>> it is not guaranteed either that a given process will always run on
> >>>>>> the
> >>>>>> same CPU.
> >>>>>> If you have a NUMA (Non Uniform Memory Access) system, then things
> >>>>>> are
> >>>>>> more complex, but I have no direct experience of that.
> >>>>
> >>>> Most recent linux versions include a useful taskset command:
> >>>> mpirun -np 8 taskset -c 8-15 ./a.out
> >>>> which should be fairly effective at placing your processes on that group
> >>>> of processors within each node. The purpose of using taskset usually is
> >>>> to improve efficiency through cache or NUMA memory affinity, but it
> >>>> could be used to do what OP appears to be requesting.
> >>>
> >>> Sorry, I must be missing something, but shouldn't this be something the OS
> >>> does? I think I recall that last time I recompiled a Linux kernel (a 2.6 one,
> >>> for AMD Opteron machine, about 6 months ago?) there was stuff related to
> >>> NUMA. I'd feel better if someone doing kernel development takes care of this
> >>> rather than having this responsibility myself :-).
> >>
> >> You're only missing that Computers Suck (IMHO) :). There is an awful lot
> >> of code in the Linux kernel to try to make NUMA machines more tolerable.
> >> But it has it's limitations -- it's designed to provide the best overall
> >> machine "responsiveness", not the lowest latency for 2 of the 80 processes
> >> running on the machine. MPI apps tend to want the second one -- a very
> >> few processes should be privledged over all others.
> >>
> >
> > Brian, thanks for your comments. I was obviously missing something
> > (the Computers Suck or a related concept, I guess :-); maybe I should
> > play around with taskset.
>
> 99.9% of the time, you won't even notice a problem with recent kernels and
> small Opteron machines (4 cores or so). But certain worksets will cause
> the kernel to do "dumb things" and as the number of cores grows, the
> kernel does a less brilliant job of keeping things under control (at
> least, that's what we've found on our quad socket, dual core machines).
>

Thanks for the pointers. That is good to know. I understand I would
also want to manually tweak settings if I am forced to running many
more simultaneous MPI jobs than cores? (e.g., my lamb-host.def says
"localhost cpu=20" in a dual core two cpu machine?).

R.

> Brian
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
Ramon Diaz-Uriarte
Statistical Computing Team
Structural Biology and Biocomputing Programme
Spanish National Cancer Centre (CNIO)
http://ligarto.org/rdiaz