LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-02-13 19:02:41


On Feb 13, 2009, at 10:56 AM, Tim Prince wrote:

>> I know I wrote some text about this in an mpi magazine column a
>> while ago, but I unfortunately don't remember which one. See all
>> my articles at http://cw.squyres.com (I really need to move these
>> to www.open-mpi.org...).
>>
> Jeff's reply greatly appreciated. Now I hope "page currently
> unavailable" will be resolved before I forget about it.

I'm back at a keyboard for a short time... what page is unavailable?
cw.squyres.com appears to be up and functioning properly.

> We have found that opportunistic ordering in MPI_Allreduce can break
> major applications, depending on data set and willingness to restart
> and accept non-repeatable results, unless sum reduction is done with
> extra precision, or the applications promote it to double, or even
> capture Allreduce internally and process in a QA tested good order.
> Maybe I'm dense, but I don't know if this relates to the original
> question.

Definitely true. IIRC, LAM uses deterministic ordering of all
reductions. There *might* be a mode to go faster / produce
nondeterministic results, but I don't remember offhand. I'd be really
surprised if we made that mode the default. Correctness/repeatability
is fairly important -- but even MPI implementations have limitations
there, depending things like on machine homogeneity, topology, etc.

-- 
Jeff Squyres
Cisco Systems