On Sep 10, 2004, at 2:40 PM, David Kendig wrote:
>> Although applications are generally source compatible between LAM/MPI
>> and MPICH (they're both implementations of the same standard, after
>> all), there are slight differences in the implementation. Most of the
>> time, these things aren't noticable, but sometimes running an MPI
>> application exclusively under one implementation and then bringing it
>> over to another implementation can highlight application bugs.
>
> My understanding is that MPI_ISEND in the MPICH implementation is
> blocked and in the LAM implementation the sending is not blocked. Is
> this true and if so, could that have exposed a flaw in our logic?
I'm not entirely sure what you mean, and I can't speak for MPICH. :-)
I can *guess* what they do, however -- it's probably quite similar to
LAM. Upon invocation of MPI_ISEND, we make a "first pass attempt" to
send the message. Depending on the size of the message and the status
of other messages in front of it, none, some, or all of the message may
be sent immediately. Regardless, MPI_ISEND must return "immediately"
(i.e., without blocking). Hence, MPI defines that you are not allowed
to touch the buffer until MPI_TEST or MPI_WAIT indicates that the
communication has completed.
If you update your buffer before TEST or WAIT, it could be exposing a
slight operational difference between LAM/MPI and MPICH (perhaps MPICH
sends out more/all of the message in your particular case then LAM
does, and so you "get [un]lucky" and your application is working under
MPICH).
Could this be happening?
>> LAM is pretty stable, and while I'm obviously not going to say that it
>> is guaranteed to be 100% bug free, have you checked your app to ensure
>> that it doesn't make some MPI assumptions that may be true in MPICH
>> but
>> aren't true in LAM?
>
>> From the wording of your mail, I can't quite tell what the exact
>> problem is -- are you just looking at the stdout from mpirun? Or are
>> your numbers output into files? If you're just looking at stdout, if
>> you have multiple MPI processes writing to stdout simultaneously, MPI
>> makes no guarantee about the order in which it is displayed. Indeed,
>> this is an inherent race condition -- you never know exactly which
>> node
>> is going to print when, etc. Is this what you're describing?
>
> I am not looking at the stdout but at the results that are written to
> a single file by a master node (node 0).
Gotcha. That should be pretty stable, then. I had to ask. :-)
> Rows of an output array
> are calculated on separate processors and then sent via MPI_ISEND
> to node 0 who does the final assembly and writing of the output
> array. Yes, this is the job of a parallel file system, but we had not
> realized such calls were part of MPI-2. Here is a snippet of a 'diff'
> between the single process results and the multiprocessor results.
>
> diff singleProcessor.results LAM.results
> 993c993
> < -3.47866893 -3.47866893 -3.47866893 -3.47866893 -3.47866893
> -3.47866893
> ---
>> -3.47866893 -3.47866893 5.52133131 -3.47866893 -3.47866893
> -3.47866893
> 1030c1030
> < 1.54010761 1.54010761 1.54010761 4.59215879 4.59215879
> 4.59215879
> ---
>> 4.59215879 1.54010761 1.54010761 4.59215879 4.59215879
> 4.59215879
>
> Particularly in the second difference on line 1030 it would appear that
> the numerical value is not incorrect but possibly just its position
> within the file.
If this is the case, I would say that using a buffer before a TEST or
WAIT is a prime suspect (e.g., start the ISEND, then update the buffer
with some new values -- potentially in the middle, like we're seeing
here, and then MPI actually comes through and sends the buffer). I
can't say for sure without knowing the nitty-gritty details of your
application, but that's as good a guess as any at this point.
Hope that helps!
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|