Bear with me if the answer to this is in the archives. . . :-)
I would like to run a parallel code of mine on a single-processor machine for the purposes of troubleshooting and prepping production runs. However, I am seeing *terrible* performance on a simulated '4-processor' run -- relative to running the same '4-processor' code on a much slower dual SMP machine. Some details follow, but my basic question is this: Are single-CPU linux boxes just awful at simulating multi-processor runs -- so I'm simply out-of-luck?
A tale of 2 boxes:
Box 1: Pentium 4 2.4b GHz-based system with DDR333 and 533 MHz FSB
Linux kernel: 2.4.18
(approximate memory bandwidth: 1200MB/sec -- based on STREAM benchmark)
LAM 6.5.9 with sysv RPI
Time for 4-processor solver test: 150 seconds
(Note: same performance observed under LAM 6.3.2)
Box 2: Dual processor Pentium 3 Xeon 500Mhz-based system
Linux kernel: 2.4.2
(approximate memory bandwidth: 300MB/sec -- based on STREAM benchmark)
LAM 6.3.2 with usysv RPI
Time for (same) 4-processor solver test: 20 seconds
Since the single processor machine has ~twice the processing speed of the dual machine, I expected that (with some degradation) the timings would be about equal. I am stunned that the faster machine is taking an order of magnitude longer than the dual-processor box for the same '4-processor' run. Has anyone seen this kind of poor performance running multi-processor code on a uniprocessor? Any suggestions?
Thanks,
Tony
---------------------------------
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
|