LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-12-17 07:54:54


On Dec 14, 2004, at 6:04 PM, Robin Humble wrote:

> a Xeon cluster with InfiniBand and topspin's hack of MPICH gets
> 6.3 microseconds latency, wheras LAM 7.1.1 is at about 15 microseconds.
> these are measured with netpipe 3.6.2's NPmpi program.
>
> Is this a known issue? Seems like it might be judging by section 3.2.2
> of the LAM users guide...

Yes, it is. We won't be fixing this in LAM/MPI -- a better IB device
will be included in Open MPI.

> Curiously enough, netpipe's native IB program (NPib) also gets about 15
> microseconds. So the topspin MPI must be using IB differently to both
> LAM and NPib.

That's quite interesting as well -- I would have assumed that Netpipe
used RDMA, but I have never looked personally.

> OTOH peak bandwidth was the same with the various MPIs, and LAM handles
> async messages better, so LAM is actually what we are using for real
> world runs...

Excellent -- great!

> Another (probably naive) IB question is how best to use dual ported
> IB cards. eg. can LAM force traffic to some nodes out one port on the
> HCA and traffic to other nodes out the other port?

Unfortunately, no. The IB RPI only uses one port at a time. That fits
in the same category as the latency issue -- it won't be changed in
LAM, but Open MPI will have better functionality.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/