LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Tim Prince (n8tm_at_[hidden])
Date: 2007-07-12 00:42:36


pedropetrovitch_at_[hidden] wrote:
> Thanks for the answer... It actually brought me another question: can
> different implementations of MPI have high difference on execution times?
>

Why did people write so many MPI implementations? Why I just spend a
few weeks trying to make a single application perform with one or more
of 4 different MPIs (not including lam)? If you are talking only about
  single node performance, you still have many questions; did you set
the -O parameter of lam? Do you use an MPI which does not normally
differentiate between processes on the same or different nodes (e.g.
mpich)? How are collectives implemented? Does it recognize which
processes have equal access to the same buffer, so no data movement
needed for message passing? Does it leave messages in a suitable length
range resident in the cache of the receiving process, so it doesn't
start out with cache misses? Does the MPI optimize its use of system calls?
What is a high difference? To one of my recent bosses, 5% was high. On
a reasonable applications with evenly distributed work, the MPI
shouldn't account for that much of the time on a single node, but easily
could exceed that as one approaches useful cluster size for the
application.
When comparing as many as 4 different MPI implementations, there is a
good chance that not all are able to complete the job on the full range
of cluster sizes, so that may be a definition of high difference.