LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Nannan Ayya (nannan.r_at_[hidden])
Date: 2007-02-28 15:41:58


Hi all,
     I am working with a 10 node LAM-MPI based cluster with BLCR. I would
like to know what algorithm or protocol is used in coordinating the
checkpointing behavior. I read in the mail archives that its a modified
implementation of Candy Lamport algorithm. But that was found in the 2004
archives. Can somebody let me know currently in what way is the coordination
done during checkpointing (on a call to cr_checkpoint). If there is a
documentation of the algorithm used, it would be great if you can point me
to the appropriate link. We are actually working on our bachelors thesis in
college and would like to know about the coordination process done to get a
global snapshot of the mpi application.
    Thanks in advance,
      Nannan