LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Pablo Milano (pablom_at_[hidden])
Date: 2003-09-03 07:59:38


Thank you very much for your deep explanation.

I tried adding a call to MPI_Test in the inner loop and I saw some
performance increment. At this point, both blocking and nonblocking have
almost the same performance.
I was thinking about replacing the basic nonblocking Isend and Irecv by
Persistent Communications. Do you think I will get some performance
benefit in LAM-MPI using this? Would I have the same TCP problems?

Thanks again,
Pablo

> -----Original Message-----
> From: lam-bounces_at_[hidden]
> [mailto:lam-bounces_at_[hidden]] On Behalf Of Jeff Squyres
> Sent: Tuesday, September 02, 2003 4:50 PM
> To: General LAM/MPI mailing list
> Subject: RE: LAM: RE: Performance on MPI using nonblocking
> communications
>
>
> On Mon, 1 Sep 2003, Pablo Milano wrote:
>
> > Let me see if I understood you properly,
> > If the buffer size is too long, MPI is unable to "start" all the
> > communications, so there are a lot of pending "startings".
>
> That's essentially right. For messages that are too long,
> LAM chooses to
> switch into a rendevouz mode for sending the message -- i.e.,
> not send the
> message until the receive is actually posted on the receiver.
>
> > Then, if I call some times MPI_Test inside my calculous,
> the processor
> > is able to start some of the pending sends/receives.
>
> If a matching receive has been posted, yes.
>
> > In this case, I should study the cost of calling MPI_Test,
> because it
> > might reduce the performance I would increase with nonblocking
> > communications.
>
> If you call it too often, yes.
>
> > Do you suggest to call MPI_Test some times spreadly while
> the program is
> > calculating in step 4?. What would happen if I call
> MPI_Test a couple of
> > times just between 3 and 4?. This way, I can use a loop and
> dynamically
> > adjust the amount of times MPI_Test is called based on the
> buffer size.
>
> This sounds reasonable. Keep in mind that since LAM is a
> single-threaded
> MPI implementation, there's a limit to how effective this
> will be -- it
> has to do with how the underlying network transport passes
> data between
> two processes.
>
> For example, in LAM's TCP module, LAM pushes out as much data
> as it can,
> but the operating system may block writing to the socket at any time.
> Hence, LAM is typically limited to how much it can actually push out
> across the socket before the underlying OS / TCP protocol
> will not allow
> any more to be sent without an acknowledgement. Hence, a
> single MPI_Send
> may be broken up into multiple write()'s down a socket before
> the entire
> message is sent. This is the kind of thing that invoking
> MPI_Test() (or
> MPI_Wait()) will do nicely -- re-enter the TCP module state
> machine, see
> how far along a request is (i.e., how many bytes still remain
> to be sent),
> and attempt to advance it further without blocking. Make sense?
>
> Depending on what is happening in the MPI context, LAM's TCP
> module may
> block (waiting to send the rest of the message) or it may
> spin trying to
> make progress on other pending sends (e.g., sending across
> other sockets).
> Additionally, even though LAM may not be able to push any
> more bytes down
> a socket, the OS may be making progress "in the background"
> while the MPI
> process is swapped out (e.g., perhaps while it is not even in the MPI
> layer).
>
> Then again, for other transports, it's totally different. gm, for
> example, has a wholly separate communications processor.
> Hence, when you
> start a send (assuming that the matching receive has already
> been posted),
> the gm processor handles all the data movements, and LAM is simply
> notified when the message has been fully delivered. So additional
> MPI_Test's in this case are not helpful (indeed, depending on how many
> times you call MPI_Test, it could turn into significant overhead).
>
> So your best bet is probably to invoke MPI_Test (or any of
> its variants)
> a few times throughout your inner loop. You probably would
> not want to
> invoke it frequently, but it depends on your specific
> application. Also,
> ensure to post your receives before your sends. This helps prevent
> unexpected messages (and additional memory copies). You may
> also wish to
> investigate using MPI_Wait (and its variants) to see if that
> will help,
> too.
>
> > Where can I find some deeper documentation about the MPI
> implementation
> > of nonblocking operations?. I need to know more about it.
>
> For the implementation, there really isn't any -- just the code in the
> freely-available MPI implementations (e.g., LAM). :-(
>
> The intent is that you start a non-blocking operation and then the MPI
> implementation does the best that it can to deliver it as fast as
> possible.
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>