LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2007-02-28 16:11:06


Nannan,

LAM/MPI uses a staggered all-to-all bookmark exchange algorithm to
drain the channels before a checkpoint operation. The LACSI 2003
paper describes this in more detail. You can find a copy on the LAM/
MPI website at the below link:
http://www.lam-mpi.org/papers/lacsi2003/

As you might have gathered from the LAM/MPI mail archives most of the
LAM/MPI developers have transitioned over to the Open MPI project
(http://www.open-mpi.org/). In near future the Open MPI project will
also support checkpoint/restart capabilities. The first
implementation is similar to the LAM/MPI implementation described in
the paper above. Open MPI goes beyond LAM/MPI in that it introduces a
more extensible framework for exploring various Checkpoint/Restart
Coordination Protocols (e.g., ) and support services (e.g.,
Checkpoint Severs, Event Loggers, ...). If you are interested in this
new feature of Open MPI be sure to watch the Open MPI mailing lists
for when it becomes part of the main development trunk.

Cheers,
Josh

On Feb 28, 2007, at 3:41 PM, Nannan Ayya wrote:

> Hi all,
> I am working with a 10 node LAM-MPI based cluster with BLCR. I
> would
> like to know what algorithm or protocol is used in coordinating the
> checkpointing behavior. I read in the mail archives that its a
> modified
> implementation of Candy Lamport algorithm. But that was found in
> the 2004
> archives. Can somebody let me know currently in what way is the
> coordination
> done during checkpointing (on a call to cr_checkpoint). If there is a
> documentation of the algorithm used, it would be great if you can
> point me
> to the appropriate link. We are actually working on our bachelors
> thesis in
> college and would like to know about the coordination process done
> to get a
> global snapshot of the mpi application.
> Thanks in advance,
> Nannan
> _______________________________________________
> lam-devel mailing list
> lam-devel_at_[hidden]
> http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel

----
Josh Hursey
jjhursey_at_[hidden]
http://www.open-mpi.org/