LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Philippe Combes (Philippe.Combes_at_[hidden])
Date: 2007-11-24 15:30:13


Brian Barrett wrote :
> On Nov 24, 2007, at 11:06 AM, Philippe Combes wrote:
>
>> I have a strange behaviour with an Irecv cancel.
>>
>> First I MPI_Test for the completion of the request.
>> As it returns false, I MPI_Cancel the request.
>> The MPI_Cancel call is successful, but the rq_flags of the request
>> remain 0
>> (instead of 2)
>> I stepped into the code of MPI_Cancel, and I found that my request
>> could not be
>> processed. Indeed it is such as
>> rq_type == 4 // LAM_RQIRECV
>> rq_state == 4 // LAM_RQFSACTIVE
>> and this case is not managed. Why ?
>>
>> You must know that the matching send has been posted before the
>> first MPI_Test
>> on the recv request, but it has not yet completed.
>> It is actually cancelled too, but that the cancel occurs before or
>> after the
>> cancel of the recv request makes no difference.
>
>
> In LAM/MPI, neither sends or receives can be cancelled once they've
> gone to the active state. A send request goes into the active state
> as soon as it's at the front of the line -- each transport has a
> slightly different definition of when this occurs, but the bottom line
> is that send requests move to the active state fairly rapidly, so
> canceling sends rarely works. On the receive side, receives enter the
> active state when the receive is matched (either by a short message or
> a rendezvous header).

Thank you for the explanation.

> Note that calling MPI_CANCEL on a request does not actually mean that
> it's cancelled. You have to call MPI_TEST_CANCELLED on the request to
> determine if the MPI either 1) has cancelled the request or 2) will
> complete the communication.

It is not exactly how I understood the man pages.
MPI_TEST_CANCELLED can only be called on a status, which in turn is set by a
completion function such as MPI_WAIT or MPI_TEST.
My problem here is that the MPI_WAIT I call right after the MPI_CANCEL never
returns, although the standard specifies clearly that it MUST return (with
either completion or cancellation of the request).
I first assumed it was due to this LAM_RQFSACTIVE state, but this might be
wrong: what should I looked for ?

Thanks,

Philippe