A follow-up to the follow-up. A fix for the usysv data corruption
problem on OS X has been fixed in subversion and will be included in
LAM 7.1.2. If you need the usysv RPI (as opposed to the sysv RPI,
which does not exhibit the data corruption issues), you can also try
the nightly build of LAM, available at:
http://www.lam-mpi.org/svn/
Hope this helps,
Brian
On Jan 7, 2005, at 12:53 PM, Brian Barrett wrote:
> It has been a couple of days, so I wanted to quickly follow-up with a
> status on this issue. We now know what the problem is, and can say
> that it is only in the usysv RPI. The sysv RPI should not exhibit
> this problem. We have a solution mostly ready, but still have some
> build system issues to clean up. If you want the details, read on.
>
> The usysv RPI makes two assumptions about the underlying memory
> subsystem: 1) writes are always ordered and 2) cache coherency.
> Unfortunately, this was lost from the documentation somewhere along
> the line (the usysv RPI is many years old - longer than I've been
> working on LAM). The PPC 970 (aka G5) does fairly aggressive
> instruction reordering that can result in unordered writes. This is
> complicated by the fact that the memory controller on a G5 machine can
> and does reorder memory writes / reads to better use the memory bus.
> So one of the two basic assumptions of the usysv rpi isn't met on an
> Apple G5 machine. I'm not sure about the IBM Blade servers based on
> the PPC 970, but would assume the same problem exists there as well.
>
> The PowerPC architecture does give us a solution to the problem, as
> there is an instruction to force the dispatch unit to dispatch all
> pending instructions before the current instruction completes. A
> well-behaving memory controller (which Apple's is) will also commit
> all pending writes before the next write is started. In short, we
> force the machine to temporarily appear to have ordered writes. While
> there appear to be a number of aggressive ways of doing this, the
> easiest (and what we are doing to fix the usysv code for 7.1.2) is to
> use the "sync" instruction between writes that must be ordered.
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have an LAM/MPI day: http://www.lam-mpi.org/
|