On 2005-06-15 05:50 (-0700), Tim Prince had pondered:
> > It's another issue, though, that the performance takes a big hit when I
> > use both processors on the same node. That behavior is common to LAM and
> > MPICH (on our system) and we've thought about a whole bunch of possible
> > reasons.
> >
> Are you raising a new topic, with a top post on a thread? It's entirely
> possible for the first process on a node to tie up the memory system, leaving
> no gain for the 2nd process. If you don't have enough RAM, or (since yours
> appears to be a 32-bit system) enough address space for 2 processes, this is
> nearly guaranteed.
>
Tim - Well, I thought, I'd make a rough mention of it. But yeah, it's not
related to John M's problem, it's an entirely different issue. And no,
it's not a case of our code being memory/addr.space starved. We've
discussed this some (with my colleagues, etc.) and come up with
various possible reasons including the one you've mentioned.
Here's a note I sent to Jeff yesterday - should've CCed this group on it,
my bad! (Adding to it... Profiling the code did not help; I didn't look at
cache misses and stuff like that though)
+++++
On 2005-06-14 13:06 (-0500), Arvind Gopu had pondered:
> Hello Jeff-
>
> Thought I'll ping you with a followup to my note sent to John at Caltech.
>
> As I mentioned, I've noticed a dramatic decrease in performance when I
> use both processors in a 2-proc node. And we're talking: a simple MPI
> (toy) program that has a message passing component. And it's on AVIDD..not
> much change in behavior if I use Myrinet/Ethernet; Static/Dynamic
> linking;etc.
>
> Let's say the serial program takes 4 mins..running the parallel code on 4
> processors on 4 different nodes takes 1 min where as running on 4
> processors on 2 nodes takes almost 2 mins. I tried searching for similar
> experiences - I did find one MPICH-Myrinet related webpage (of relevance)
> where they talked about mem-copy issues and advised use of Myrinet for
> intra-node communication too, but that did not solve the problem (w.r.t
> MPICH..and neither w.r.t LAM if I used different SSI RPIs)
>
> I've discussed this with a few people within UITS and we could think of
> cache-trashing type issues, bus-related limitations, IO pipe bottlenecks,
> etc (a few other things, I can't remember of the top of my head). But I am
> not sure if we're missing something more important or something obvious.
>
> If I had to explain to Joe User who asks (someone did): "why is my almost
> trivially ||'izable program not showing expected speedup?"..would this
> (silly) analogy make sense: "Subway might have a bunch of people working,
> more than one knife, etc. But since they have only one toaster, it might
> take longer if you wanted your sub toasted" :-)
>
> Sorry about that silly analogy, but a bit of humor usually does not hurt!
>
> cheers, Arvind
+++++
_____________________________________________________________________
Arvind Gopu | High Performance Computing Group| (UITS-RAC-HPC) @ IU
HPC website: http://www.indiana.edu/~rac/hpc | Work: (812) 856-0187
My website: http://cs.indiana.edu/~agopu | Cell: (812) 361-4054
|