LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Yaron Minsky (yminsky_at_[hidden])
Date: 2005-01-03 16:15:29


Gropp and Lusk wrote a paper called "Fault Tolerance in MPI
Programs"[1] the suggested an intercommunicator-based approach to
building a fault tolerant worker-slave style application. I've tried
to do something similar in LAM and have failed utterly. Has anyone
succeeded? And if so, do they have any examples that one could look
at?

The authors appear to have built their example using MPICH. Is MPICH
a more congenial environment for this class of applications?

Thanks in advance,
Yaron Minsky

[1] http://www-unix.mcs.anl.gov/~gropp/bib/papers/2002/mpi-fault.pdf