HI!
I ran HPCC benchmark for problem size 60000x60000 on 20 nodes with 4x5 processor grid for block sizes 64,128,256,512. I have some trouble evaluating the results. The performance FFTE is very low, is the FLOPS evaluated for 20 processors or only single processor. IN either case why is the performance very low comapred to HPL. What does PTRANS results signify and how different are they from communication bandwidth results.
HPL: max FLOPS = 62GFLOPS
FFT: Minimum Gflop/s 0.399492
Average Gflop/s 0.421822
Maximum Gflop/s 0.431742
PTRANS results:
TIME M N MB NB P Q TIME CHECK GB/s RESID
---- ----- ----- --- --- --- --- -------- ------ -------- -----
WALL 30000 30000 64 64 4 5 12.69 PASSED 0.567 0.00
CPU 30000 30000 64 64 4 5 4.04 PASSED 1.782 0.00
WALL 30000 30000 128 128 4 5 12.46 PASSED 0.578 0.00
CPU 30000 30000 128 128 4 5 3.72 PASSED 1.935 0.00
WALL 30000 30000 256 256 4 5 12.36 PASSED 0.583 0.00
CPU 30000 30000 256 256 4 5 3.75 PASSED 1.920 0.00
WALL 30000 30000 512 512 4 5 13.80 PASSED 0.522 0.00
CPU 30000 30000 512 512 4 5 5.07 PASSED 1.420 0.00
communication bandwidth latency results
Max Ping Pong Latency: 0.047505 msecs
Randomly Ordered Ring Latency: 0.048903 msecs
Min Ping Pong Bandwidth: 72.532088 MB/s
Naturally Ordered Ring Bandwidth: 34.848659 MB/s
Randomly Ordered Ring Bandwidth: 34.597277 MB/s
Thanks for ur help
Regards
Srinivasa patri
|