Hi Brian,
Thanks for the detailed response. I'm testing out Lam 7 from cvs now -
the numbers do look better. I tried netPIPE its an interesting tool,
unfortunate that it is only point to point so you can only run it on 2
nodes at a time. I'd love to see a meshed implementation that allowed you
to test bandwidth simultaneously between N nodes. (I know, I know, I
should stop complaining and write one :P)
I compiled lam7 with the option to increase the short message to long
message crossover point at 128KB.
The numbers I got back from netPIPE were encouraging. As one would expect
with small message sizes ~100 bytes I was getting only 12Mbps. However
with larger message of say 1MB I was getting performance of around
850Mbit/s. With reasonably good performance from 16kb (~300Mbps) and up.
Incidentally if you are interested MPICH was only able to get to about
250Mbps.
This suggests to me there is not a problem with our switch or drivers. But
please correct me if you think I am jumping the gun.
Can you explain to me "unexpected receives"? The application we are
running is the Linpack cluster benchmark. (HPL == High Performance
Linpack I believe). It should scale well I would think. The latency
numbers seem to be fine from netPIPE, latency for a 131kb message for
instance is 0.001512s which seems reasonable.
Another interesting point with lam7 running in -lamd mode the lamd process
uses almost 0% cpu vs. 40% cpu in lamd mode with lam 6.5.9.
I may dig into XMPI tomorrow, it may be helpful.
Thanks again,
Peter McLachlan
e-mail: pmclachl_at_[hidden]
|