LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2003-06-04 23:27:01


On Wed, 4 Jun 2003, Peter McLachlan wrote:

> I compiled lam7 with the option to increase the short message to long
> message crossover point at 128KB.

As Robin said, you can also tune the HPL library - this will probably make
the biggest difference. And also don't forget that you can tune the
cross-over point for short and long at run-time in LAM 7.0 - the User's
Guide should talk about what environment variables to set and all that
good stuff.

If you are using SMP machines, you should also try both the pure TCP and
the shmem RPIs (RPI == our transport engine). For some apps, pure TCP
does better, for others, the combination of shared memory and TCP does
better. Really depends on the app. In 6.5.9, this was a compile-time
decision. As with many things, this is now a run-time choice in 7.0.

Also as Robin said, make sure you are setting the "CPU count" parameter
correctly and using mpirun C to run your app if you are using SMP machines
- that can cut down on your "out-of-box" communication, especially on an
app that talks to neighbors more than to distant relatives. Again, see
the User's Guide for more info :).

> The numbers I got back from netPIPE were encouraging. As one would expect
> with small message sizes ~100 bytes I was getting only 12Mbps. However
> with larger message of say 1MB I was getting performance of around
> 850Mbit/s. With reasonably good performance from 16kb (~300Mbps) and up.
> Incidentally if you are interested MPICH was only able to get to about
> 250Mbps.
>
> This suggests to me there is not a problem with our switch or drivers. But
> please correct me if you think I am jumping the gun.

If the lines are fairly smooth, I would think your driver is probably ok.
The bad ones I have seen in the past had really jagged performance graphs
and were just all over the place.

> Can you explain to me "unexpected receives"? The application we are
> running is the Linpack cluster benchmark. (HPL == High Performance
> Linpack I believe). It should scale well I would think. The latency
> numbers seem to be fine from netPIPE, latency for a 131kb message for
> instance is 0.001512s which seems reasonable.

So an unexpected receive is any message that arrives and is processed by
the MPI implementation before the MPI_*recv is posted. In general, these
aren't good for performance. But you might not be able to do anything in
your case (since you can't muck with the code too much).

> Another interesting point with lam7 running in -lamd mode the lamd process
> uses almost 0% cpu vs. 40% cpu in lamd mode with lam 6.5.9.

Interesting.... I think I fixed one timeout problem a very, very long
time ago in a galaxy far far away that would explain the difference.

Best of luck, and let us know if you find any problems with the 7.0 beta -
we hope to go stable RSN :).

Brian

-- 
  Brian Barrett
  LAM/MPI developer and all around nice guy
  Have a LAM/MPI day: http://www.lam-mpi.org/