On Dec 14, 2004, at 6:04 PM, Robin Humble wrote:
> a Xeon cluster with InfiniBand and topspin's hack of MPICH gets
> 6.3 microseconds latency, wheras LAM 7.1.1 is at about 15 microseconds.
> these are measured with netpipe 3.6.2's NPmpi program.
>
> Is this a known issue? Seems like it might be judging by section 3.2.2
> of the LAM users guide...
Yes, it is. We won't be fixing this in LAM/MPI -- a better IB device
will be included in Open MPI.
> Curiously enough, netpipe's native IB program (NPib) also gets about 15
> microseconds. So the topspin MPI must be using IB differently to both
> LAM and NPib.
That's quite interesting as well -- I would have assumed that Netpipe
used RDMA, but I have never looked personally.
> OTOH peak bandwidth was the same with the various MPIs, and LAM handles
> async messages better, so LAM is actually what we are using for real
> world runs...
Excellent -- great!
> Another (probably naive) IB question is how best to use dual ported
> IB cards. eg. can LAM force traffic to some nodes out one port on the
> HCA and traffic to other nodes out the other port?
Unfortunately, no. The IB RPI only uses one port at a time. That fits
in the same category as the latency issue -- it won't be changed in
LAM, but Open MPI will have better functionality.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|