Sorry for the incredibly late followup on this thread. I had made
some notes then never sent it off. :(
Certainly PVM and MPI are similar in many ways. One way in which they
are different is their view of the communication world. PVM can be
more forgiving in this sense than MPI, or more specifically MPI-1.X.
In PVM you have the notion of spawn'ing more processes to dynamically
grow and shrink an application. In MPI-2.Y, dynamic processes were
added to allow for similar operations. MPI-2 dynamics in combination
with intracommunicators has been one suggested way of emulating the
failure recovery scenario you outline below. This harnesses the
notion that the fate of one intracommunicator is separate from the
fate of another. Meaning that if a process fails in one
intracommunicator, the rest of the process will continue to run (in
separate communicators), and errors will be returned if your
application attempts to communicate to the failed process
intracommunication group. This works well for Manager/Worker type
programs. For more details take a look at the following paper:
http://dx.doi.org/10.1177/1094342004046045
http://www-unix.mcs.anl.gov/~gropp/bib/papers/2002/mpi-fault.pdf
All that being said it takes a fairly robust and complete MPI-2.Y
implementation to support such dynamic operations. This method works
in some implementations, but certainly not all. Unfortunately I have
not tried this method in a few years so I am unsure as to how a
specific MPI implementation will operated (including LAM/MPI and Open
MPI).
Checkpoint/Restart has its place, but if your application is capable
and willing to do the recovery work then alternative methods can be
pursued. It may also be worth checking out FT-MPI's recovery
semantics, and see if that model works for your application.
I hope that helps a bit. Sorry again for the late followup.
-- Josh
On Oct 17, 2006, at 1:36 PM, Alexander L. Belikoff wrote:
> Hmm... checkpoints and restarts is good stuff in general but they
> require a lot of redesign and a significant amount of added
> complexity.
>
> The reason I raised the question in the first place was because I was
> toying with an idea of transitioning a distributed application that
> currently uses PVM to MPI. In PVM, handling peer process
> termination as
> well as control over the application topology in the VM is fairly
> easy.
> Since we are obsessive (for a good reason) about fault tolerance, our
> application (that is, it's "rank 0" master process) knows when a
> "slave"
> dies and resubmits the job to another one. Moreover, we can also do
> cool
> things like "blacklisting" some slaves on a certain machine, when
> we are
> confident the machine is not doing well, and get those slaves
> restarted
> after some period of time - all from within the master process!
>
> Unfortunately, I don't see how this level of service can be
> achieved in
> MPI (at least in a fairly standard-compliant implementation) -
> especially given your response. Which is somewhat sad, since in a
> distributed application (which is MPI's "raison d'etre") there are
> plenty of points of failure and many failures are not critical
> enough to
> justify the full application restart. It would be great to see a
> fairly
> simple API (no need for transaction/restarts/checkpoints) achieving
> just
> that - in my opinion, it would make MPI much more suitable for
> reasonably fault-tolerant applications (a requirement for many large
> systems, including the one, I'm dealing with.
>
> Regards,
> -- Sasha
>
> Jeff Squyres wrote:
>>
>> LAM is -- at best -- only pseudo-able to handle the death of an MPI
>> process. Specifically, I wouldn't recommend trying to write a fault
>> tolerant MPI application using LAM/MPI that could withstand the death
>> of a process in MPI_COMM_WORLD.
>>
>> Keep in mind that MPI [quite intentionally] does not specify what
>> happens when a process dies, so it's totally up to the implementation
>> as to what to do. Most MPI's, LAM/MPI included, simply kill the rest
>> of the job. FT-MPI out of the University of Tennessee allows you to
>> do some interesting things, but you need to specifically write code
>> to their API, etc.
>>
>> Work is ongoing in Open MPI to be able to handle these kinds of
>> errors. The first step is adding checkpoint/restart capabilities in
>> Open MPI (the hardest part of which is all the infrastructure needed
>> to make that possible), and then we'll do more interesting things
>> after that (to include FT-MPI-like things).
>>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
----
Josh Hursey
jjhursey_at_[hidden]
http://www.open-mpi.org/
|