MPI_Probe is evil and should be avoided -- it usually forces LAM to do
an extra internal memory copy.
Instead of using MPI_Wait, you can use MPI_Waitany so that you'll
unblock as soon as any of the requests complete (i.e., as soon as any
block arrives). You'll be told which request has completed, which
you'll have to map back to figure out which block it is that has
arrived so that you can process it appropriately (or, depending upon
your communication protocols, that information may be contained in your
message).
You can also use a non-blocking polling approach with MPI_Test if you
have any other local work that needs to be done and can happen
independently, without the information from the remote blocks.
Hope that helps.
On Dec 19, 2004, at 11:16 PM, Lei_at_ICS wrote:
> Hi,
>
> In parallel matrix multiplication, the C submatrices C1, C2, C3, etc
> are computed using A and B submatrix pairs (A1, B1), (A2, B2),
> (A3, B3), etc received from other PEs. If I loop over C1, C2, ...
> in that order, my MPI_wait() may really have to spend time waiting
> for the submatrix pairs (Ai, Bi) to come, even if other pairs (Aj, Bj)
> have already arrived. So my questions is: how do I pick the already
> arrived pairs to compute, so that my CPUs are mostly busy and
> the cost of communication is partially hidden? Is MPI_probe()
> the right way to go? Do I need to maintain a queue myself to manage
> the skipped pairs (since they are still being communicated) so I can
> come back to them at a later time?
>
> Any suggestions are highly appreciated!
>
> Thanks,
>
> -Lei
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|