On Oct 23, 2007, at 9:10 AM, Bodi Debnath wrote:
> I have a program that is doing shared memory window type data sharing
> between two executables (also called RMA). This program does its own
> buffering before sending a lot of messages over MPI. Its running on
> suse 64 with LAM 7.1.4.
>
> The problem in the pseudo-ish code below is the following: Both
> processes expose their memory using MPI_Post / MPI_Start. Do a bunch
> of MPI_Puts and close the window with MPI_Complete MPI_Wait. Right
> after that we read our local buffer to see what we received. When we
> read our local buffer, MPI still hasnt finished exposing the data to
> the local buffer I believe. Because after I send a lot of messages, I
> see that the messages transmitted at the tail end of the epoch are not
> present when I read the local buffer.
>
> When I instead put a sleep before I read my local buffer, (see the
> commented out line of code), I can read my messages fine. But if I
> replace the sleep with a MPI_Barrier for example, then it doesnt work.
>
> Not sure what the problem is. Is it the fact that both executables are
> trying to write each other's buffer at the same time? Is it the
> multiple MPI_Puts? The data written to the local buffer is not
> supposed to be exposed until some later point? Missing some other kind
> of sync?
Sorry about the slow reply, it's been a long week. Anyway, I don't
see anything explicitly wrong with the pseudo-code, but as they say,
the devil is often in the details. If you have a short example
program that exhibits the problem, I can try to replicate it and
narrow it down further.
One thing that bothers me is that the sleep() fixes the problems.
LAM's one-sided implementation is implemented entirely over our point-
to-point engine, with accounting for completion done on both the
origin and target processes. Since the only way messages can be
delivered is to explicitly copy them into the user buffer, which would
require MPI progress, which would require entering the MPI library
(for LAM, anyway), I can't see how a sleep would help. What
interconnect are you using? I could maybe see a possible way it could
happen if you were using Myrinet/GM and large message buffers, but
even then, I'm pretty confused. Have you tried your code with another
MPI (like Open MPI or MPICH2)? If it happens with multiple MPI
implementations, that would definitely help narrow it down a bit.
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|