thanks Brian for your reply. Could I get you to please clarify the
following:
1. MPI_BSend() returns when the buffer is free to use, but I'm still
confused as to whether this also implies (in the lam implementation)
that the sending process has completed with the communication. For
example with a rendezvous protocol, that both the negotiation has been
completed (the ack has been received from the receiving process), and
the data has been passed to the OS to send, ie there is nothing more for
the sending process to do. Am I correct in assuming that progress
threads are not being used in lam?
If the above is true, then for lam, MPI_Bsend should not offer any
advantage in terms of apparent latency over MPI_Send?
thanks
Mike
Brian W. Barrett wrote:
> On Mon, 24 May 2004 mikea_at_[hidden] wrote:
>
>
>>would somebody please be able to clarify for me the finer distinctions
>>between the different send and receive calls (buffered and standard,
>>blocking and non- blocking) in terms of the time taken by the calling
>>process. That is, I'm not concerned about the time taken to complete the
>>entire communication, but only the time that the calling process must
>>spend inside the mpi calls. I'm aware of what the mpi standard states,
>>but its not clear to me how it is interpreted within lam-mpi given that
>>it is single-threaded and presumably does not have internal "progress"
>>threads. My understanding of the lam documentation is that the
>>communication only progresses when the calling process is inside an mpi
>>call, hence some of the advantages of non-blocking and buffered
>>disappear. Specifically I'm guessing that:
>>
>>1. Buffering should ensure that send() can return as soon as the message
>>is buffered, but with lam I presume that a buffered send() should take
>>at least as long as a non-buffered send() to return, as in both cases
>>the call can not return until the sending process has fully completed
>>with the message.
>
>
> So this is a really hard question to answer, but just some general
> notes... MPI_Send and MPI_BSend both use an eager prototcol for sending
> short messages. After some point, a rendezvous protocol is used that
> requires an ACK from the receiving side before sending the actual data.
> MPI_Ssend always uses the rendezvous protocol, and MPI_Isend does
> something similar to MPI_Send and MPI_Bsend.
>
> Note that just because a send has returned does *NOT* mean the message has
> been sent. For MPI_Send and MPI_Bsend, a return means that the supplied
> buffer can be reused by the user. For MPI_Ssend, a return means that the
> receiving side has *started* to receive the message. For MPI_Isend, all
> that the function returns means is that the send has started.
>
>
>>2. Similarly for lam, the combined time spent in a non-blocking send()
>>and its matching wait() will be at least that spent in a blocking
>>send(), regardless of the time interval between isend() and wait(),
>>unless lam uses "progress" threads. The same applies for receive().
>
>
> Not necessarily. It could be more, the same, or less. Which isn't really
> a useful answer, but let's look at the TCP case:
>
> same: this is fairly trivial - you guessed how this would work...
>
> less: remember that the OS does some buffering for us. Let's say that we
> are sending a rendezvous protocol send. During the ISend call, we send
> the RTS and return. During the Wait, we grab the CTS that is waiting for
> us already and start sending data. We manage to skip out on an entire
> round trip for the RTS/CTS protocol. There are also some buffering cases
> that make Isends faster as well...
>
> mori: If you have lots of Isends posted, you can actually slow things down
> in the bookkeeping as LAM tries to progress all the messages. This is
> especially true on the shared memory protocols, if I recall correctly.
>
>
>>3. Using the lamd rpi module, will a standard blocking send() return
>>once the message has been passed to the local daemon, or only when a
>>matching receive has also been posted? I'm guessing the former, and that
>>the above points also apply to lamd. I can see an advantage in using
>>lamd, as communication with a process on the local node is cheaper than
>>with one on a remote node, but will there be a difference whether the
>>send is blocking or non-blocking, buffered or standard?
>
>
> Remember that the standard talks about when it is safe to reuse the buffer
> more than anything. Since you can reuse the buffer as soon as the message
> is sent to the local lamd, the apparent latency for medium sized messages
> is often lower than the TCP rpi. However, the realized bandwidth will be
> much lower, so it isn't necessarily going to be a win.
>
> Hope this helps...
>
>
> Brian
>
--
-----------------------------------------------------------------------
Computational Neurobiology Laboratory : Phone: 858 453-4100x1455
The Salk Institute of Biological Studies : Fax: 858 587-0417
10010 N. Torrey Pines Rd, La Jolla, CA 92037 : Email: mikea_at_[hidden]
: www.cnl.salk.edu/~mikea
|