That is what Waitany does in a program with no faults. However, MPI
makes no guarantees about what happens in an application that has
errors at runtime. LAM is noticing that at least one process is gone
without disconnecting properly when you call Waitany, and is therefore
generating an MPI exception.
Specifically, it's not Waitany itself that is noticing the problem --
its the pass through LAM's progression engine that is noticing that a
process is now dead.
On Nov 29, 2004, at 4:14 AM, Dominik Epple wrote:
> Hi list,
>
> I am trying to write some fault-tolerant code, guided by the example
> in the distribution.
>
> I have a call to MPI_Waitany:
>
> err=MPI_Waitany(ntasks, req_slaves, &idone, MPI_STATUS_IGNORE);
>
> This call produces
>
> MPI_Recv: process in remote group is dead (rank 0, MPI_COMM_PARENT)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Recv()
> Rank (0, MPI_COMM_WORLD): - main()
>
> if all elements of req_slaves are MPI_REQUEST_NULL. If I understand
> the documentation correctly, it should not abort, but complete.
> Cf. the man page:
>
> NOTES
> If all of the requests are MPI_REQUEST_NULL , then index
> is returned as MPI_UNDEFINED , and stat is returned as an
> empty status.
>
> What am I missing here?
>
> Thanks, Dominik.
> --
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|