Thanks Brian, I never realised that LAM progresses _all_ pending
communications whenever _any_ communication function is called.
Understanding that clarifies everything.
Mike
> The MPI_Send() will have to wait until the process that has posted the
> MPI_Send has entered into an MPI function that does communication (it
> could be another send - LAM will progress all pending communication
> whenever any communication function is called).
>
>> ie, for a rendezvous protocol, can the different stages in the receive
>> (such as the ack) be progressed by either MPI_IRecv or MPI_Wait
>> depending on whether there is a send request pending? Does MPI_IRecv
>> always post an ack regardless (ie a "ready" rather than an "ack")? And
>> can the different stages only be progressed when the receiving process
>> is in one of these two calls?
>
>
> No - The ack is a CTS, meaning that the receiving side has somewhere to
> put the incoming message and is ready to receive the message (and deal
> with truncation errors and the like). I suppose that you could
> implement a system where you ready sent the acks for Irecvs that had a
> specified source, but the structure of our progression engine does not
> permit this.
>
>> Similarly, if the sender uses MPI_ISend() instead of MPI_Send(), and
>> there is already an ack there waiting from the receiver from a prior
>> MPI_IRecv, does MPI_ISend progress to the next stage and start the
>> data transfer, effectively completing the communication from the
>> sender's perspective.
>
>
> Since we don't pre-post ACKs, it doesn't quite work like that. We still
> need for the MPI_Irecv to say "yeah, I've got room for that message"
> before we start shoving the data around.
>
>> With an eager protocol is the send operation purely local?
>
>
> With the eager protocol, sends (with the exception of MPI_Ssend) are
> basically local. They give the message to the kernel's tcp stack and
> forget about it. The kernel deals with the rest.
>
> I should point out that the discussion above is for the TCP transport.
> The details for the Myrinet/gm and upcoming InfiniBand transports are a
> little bit different, but the general concepts are the same. The shared
> memory transport requires interactions from both parties for pretty much
> every send, but if you have enough shared memory available, the amount
> of lockstepping done isn't too horrible.
>
> Hope this helps,
>
> Brian
>
>
>> Brian W. Barrett wrote:
>>
>>> On Mon, 24 May 2004 mikea_at_[hidden] wrote:
>>>
>>>> would somebody please be able to clarify for me the finer distinctions
>>>> between the different send and receive calls (buffered and standard,
>>>> blocking and non- blocking) in terms of the time taken by the calling
>>>> process. That is, I'm not concerned about the time taken to complete
>>>> the
>>>> entire communication, but only the time that the calling process must
>>>> spend inside the mpi calls. I'm aware of what the mpi standard states,
>>>> but its not clear to me how it is interpreted within lam-mpi given that
>>>> it is single-threaded and presumably does not have internal "progress"
>>>> threads. My understanding of the lam documentation is that the
>>>> communication only progresses when the calling process is inside an mpi
>>>> call, hence some of the advantages of non-blocking and buffered
>>>> disappear. Specifically I'm guessing that:
>>>>
>>>> 1. Buffering should ensure that send() can return as soon as the
>>>> message
>>>> is buffered, but with lam I presume that a buffered send() should take
>>>> at least as long as a non-buffered send() to return, as in both cases
>>>> the call can not return until the sending process has fully completed
>>>> with the message.
>>>
>>> So this is a really hard question to answer, but just some general
>>> notes... MPI_Send and MPI_BSend both use an eager prototcol for sending
>>> short messages. After some point, a rendezvous protocol is used that
>>> requires an ACK from the receiving side before sending the actual data.
>>> MPI_Ssend always uses the rendezvous protocol, and MPI_Isend does
>>> something similar to MPI_Send and MPI_Bsend.
>>> Note that just because a send has returned does *NOT* mean the
>>> message has
>>> been sent. For MPI_Send and MPI_Bsend, a return means that the supplied
>>> buffer can be reused by the user. For MPI_Ssend, a return means that
>>> the
>>> receiving side has *started* to receive the message. For MPI_Isend, all
>>> that the function returns means is that the send has started.
>>>
>>>> 2. Similarly for lam, the combined time spent in a non-blocking send()
>>>> and its matching wait() will be at least that spent in a blocking
>>>> send(), regardless of the time interval between isend() and wait(),
>>>> unless lam uses "progress" threads. The same applies for receive().
>>>
>>> Not necessarily. It could be more, the same, or less. Which isn't
>>> really
>>> a useful answer, but let's look at the TCP case:
>>> same: this is fairly trivial - you guessed how this would work...
>>> less: remember that the OS does some buffering for us. Let's say
>>> that we
>>> are sending a rendezvous protocol send. During the ISend call, we send
>>> the RTS and return. During the Wait, we grab the CTS that is waiting
>>> for
>>> us already and start sending data. We manage to skip out on an entire
>>> round trip for the RTS/CTS protocol. There are also some buffering
>>> cases
>>> that make Isends faster as well...
>>> mori: If you have lots of Isends posted, you can actually slow things
>>> down
>>> in the bookkeeping as LAM tries to progress all the messages. This is
>>> especially true on the shared memory protocols, if I recall correctly.
>>>
>>>> 3. Using the lamd rpi module, will a standard blocking send() return
>>>> once the message has been passed to the local daemon, or only when a
>>>> matching receive has also been posted? I'm guessing the former, and
>>>> that
>>>> the above points also apply to lamd. I can see an advantage in using
>>>> lamd, as communication with a process on the local node is cheaper than
>>>> with one on a remote node, but will there be a difference whether the
>>>> send is blocking or non-blocking, buffered or standard?
>>>
>>> Remember that the standard talks about when it is safe to reuse the
>>> buffer
>>> more than anything. Since you can reuse the buffer as soon as the
>>> message
>>> is sent to the local lamd, the apparent latency for medium sized
>>> messages
>>> is often lower than the TCP rpi. However, the realized bandwidth
>>> will be
>>> much lower, so it isn't necessarily going to be a win.
>>> Hope this helps...
>>> Brian
>>
>>
>> --
>> -----------------------------------------------------------------------
>> Computational Neurobiology Laboratory : Phone: 858 453-4100x1455
>> The Salk Institute of Biological Studies : Fax: 858 587-0417
>> 10010 N. Torrey Pines Rd, La Jolla, CA 92037 : Email: mikea_at_[hidden]
>> : www.cnl.salk.edu/~mikea
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
--
-----------------------------------------------------------------------
Computational Neurobiology Laboratory : Phone: 858 453-4100x1455
The Salk Institute of Biological Studies : Fax: 858 587-0417
10010 N. Torrey Pines Rd, La Jolla, CA 92037 : Email: mikea_at_[hidden]
: www.cnl.salk.edu/~mikea
|