LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Manish Chablani (mchablan_at_[hidden])
Date: 2003-04-02 18:10:34


Hi,

There are many other factors that might have had influence on your
measured performance like
- RAM size
- cache size
- what exactly your program is doing (is it continually hitting memory? ,
is it causing millions of cache reloads in the single CPU case?, is it
thrashing the RAM on the single CPU case? )
In general, you will definitely be trashing process scheduling more on the
1 CPU box.

I would advice you to study behavior of your program what it does (Amount
of memory it uses and how frequently it references memory) and come up
with reason based on above factors.

hope this helps,
Manish Chablani
------------------------------------------------------
Graduate Student, CS Department, Indiana University.
http://www.cs.indiana.edu/~mchablan

LAM/MPI Developer
Make today a LAM/MPI day !!!
http://www.lam-mpi.org
------------------------------------------------------

On Mon, 31 Mar 2003, Tony Caola wrote:

>
> Bear with me if the answer to this is in the archives. . . :-)
>
> I would like to run a parallel code of mine on a single-processor machine for the purposes of troubleshooting and prepping production runs. However, I am seeing *terrible* performance on a simulated '4-processor' run -- relative to running the same '4-processor' code on a much slower dual SMP machine. Some details follow, but my basic question is this: Are single-CPU linux boxes just awful at simulating multi-processor runs -- so I'm simply out-of-luck?
>
> A tale of 2 boxes:
>
> Box 1: Pentium 4 2.4b GHz-based system with DDR333 and 533 MHz FSB
> Linux kernel: 2.4.18
> (approximate memory bandwidth: 1200MB/sec -- based on STREAM benchmark)
> LAM 6.5.9 with sysv RPI
> Time for 4-processor solver test: 150 seconds
> (Note: same performance observed under LAM 6.3.2)
>
> Box 2: Dual processor Pentium 3 Xeon 500Mhz-based system
> Linux kernel: 2.4.2
> (approximate memory bandwidth: 300MB/sec -- based on STREAM benchmark)
> LAM 6.3.2 with usysv RPI
> Time for (same) 4-processor solver test: 20 seconds
>
> Since the single processor machine has ~twice the processing speed of the dual machine, I expected that (with some degradation) the timings would be about equal. I am stunned that the faster machine is taking an order of magnitude longer than the dual-processor box for the same '4-processor' run. Has anyone seen this kind of poor performance running multi-processor code on a uniprocessor? Any suggestions?
>
> Thanks,
>
> Tony
>
>
>
> ---------------------------------
> Do you Yahoo!?
> Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!