On Mon, 1 Sep 2003, Pablo Milano wrote:
> Let me see if I understood you properly,
> If the buffer size is too long, MPI is unable to "start" all the
> communications, so there are a lot of pending "startings".
That's essentially right. For messages that are too long, LAM chooses to
switch into a rendevouz mode for sending the message -- i.e., not send the
message until the receive is actually posted on the receiver.
> Then, if I call some times MPI_Test inside my calculous, the processor
> is able to start some of the pending sends/receives.
If a matching receive has been posted, yes.
> In this case, I should study the cost of calling MPI_Test, because it
> might reduce the performance I would increase with nonblocking
> communications.
If you call it too often, yes.
> Do you suggest to call MPI_Test some times spreadly while the program is
> calculating in step 4?. What would happen if I call MPI_Test a couple of
> times just between 3 and 4?. This way, I can use a loop and dynamically
> adjust the amount of times MPI_Test is called based on the buffer size.
This sounds reasonable. Keep in mind that since LAM is a single-threaded
MPI implementation, there's a limit to how effective this will be -- it
has to do with how the underlying network transport passes data between
two processes.
For example, in LAM's TCP module, LAM pushes out as much data as it can,
but the operating system may block writing to the socket at any time.
Hence, LAM is typically limited to how much it can actually push out
across the socket before the underlying OS / TCP protocol will not allow
any more to be sent without an acknowledgement. Hence, a single MPI_Send
may be broken up into multiple write()'s down a socket before the entire
message is sent. This is the kind of thing that invoking MPI_Test() (or
MPI_Wait()) will do nicely -- re-enter the TCP module state machine, see
how far along a request is (i.e., how many bytes still remain to be sent),
and attempt to advance it further without blocking. Make sense?
Depending on what is happening in the MPI context, LAM's TCP module may
block (waiting to send the rest of the message) or it may spin trying to
make progress on other pending sends (e.g., sending across other sockets).
Additionally, even though LAM may not be able to push any more bytes down
a socket, the OS may be making progress "in the background" while the MPI
process is swapped out (e.g., perhaps while it is not even in the MPI
layer).
Then again, for other transports, it's totally different. gm, for
example, has a wholly separate communications processor. Hence, when you
start a send (assuming that the matching receive has already been posted),
the gm processor handles all the data movements, and LAM is simply
notified when the message has been fully delivered. So additional
MPI_Test's in this case are not helpful (indeed, depending on how many
times you call MPI_Test, it could turn into significant overhead).
So your best bet is probably to invoke MPI_Test (or any of its variants)
a few times throughout your inner loop. You probably would not want to
invoke it frequently, but it depends on your specific application. Also,
ensure to post your receives before your sends. This helps prevent
unexpected messages (and additional memory copies). You may also wish to
investigate using MPI_Wait (and its variants) to see if that will help,
too.
> Where can I find some deeper documentation about the MPI implementation
> of nonblocking operations?. I need to know more about it.
For the implementation, there really isn't any -- just the code in the
freely-available MPI implementations (e.g., LAM). :-(
The intent is that you start a non-blocking operation and then the MPI
implementation does the best that it can to deliver it as fast as
possible.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|