This is because LAM is a single-threaded MPI implementation -- it
cannot make progress on the rendezvous protocol when it is not in the
MPI library.
Consider two scenarios:
1. If the sender initiates the send at time T, and the receiver does
not go into the MPI library until time T+N, then the ACK won't be
sent back to the sender (and therefore the real data transfer
initiated) until T+N+1. N here represents the time delay between the
send and when the receive actually is posted on the receiver
(potentially a large value).
2. In the above scenario, if the receive has *already* posted by the
time the sender initiates the send at time T, the sender may still
not get the ACK back during the ISEND, and therefore simply return
without initiating the actual data transfer. Hence, the ACK is not
noticed until the WAIT, and the transfer is actually initiated then.
This is likely the case that you're seeing: it's the delay in the
sender that is causing the delay.
What you can do to mitigate this effect is call MPI_TEST one or more
times during your computation loop. This will keep non-blocking
communications working while you're off doing non-communication
things. This is certainly sub-optimal, but it will work.
In short, LAM has limited overlap capability because a) it's single
threaded and b) the rendezvous protocol requires at least some
intervention from the MPI library itself.
Hope that helps.
On Mar 22, 2007, at 1:01 PM, Vartan Padaryan wrote:
> Jeff Squyres wrote:
>> This sounds
>> right. It's been eons since I've looked at the GM code in LAM,
>> but the
>> short sends are eager and the long sends use a rendezvous protocol.
>> Specifically, the main content of the message won't be sent for a
>> "long"
>> message until the receiver ACKs that a matching MPI receive has been
>> posted.
>>
>> This is, among other reasons, to help prevent resource exhaustion
>> at the
>> receiver.
>>
>> I believe that there is an SSI parameter to change the size of
>> short /
>> long messages, but I don't recall what it is offhand. Did you
>> look in
>> the LAM/MPI User's Guide? I seem to recall documenting all such
>> things
>> in there...
>>
>>
>> --Jeff Squyres
>> Cisco Systems
>>
>>
>
> Yes, I know about param rpi_gm_tinymsglen. But the point is, Recv
> depends on completion of Wait in proc-sender, whereas data
> transmission
> had been finished a long time ago - there is large enough space
> between
> Isend and Wait. In my opinion, the natural behavior of such
> communication pattern is completion of Recv as soon as data has
> arrived.
> But in practice, in case of large buffer, receiver can be paused for
> arbitrary period.
>
> WBR, Vartan Padaryan.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
|