LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2009-08-19 08:34:04


On Aug 19, 2009, at 3:28 AM, Blaise-Omer.Yenke_at_[hidden] wrote:

> Hi all
>
> I'm conducting some experiments to evaluate the checkpointing time
> of a parallel application with LAM/MPI.

You may want to consider using Open MPI, since LAM/MPI is in
maintenance mode and no longer being actively developed.

>
> I'd like to know whether the processes of the application are saved
> one after another or in parallel, after the synchronization phase.

Once the checkpoint message coordination protocol is finished all of
the checkpoints are written in parallel from each process in the
parallel job.

>
> I'll be greatfull if there is some references.

There is one paper on checkpoint/restart in LAM/MPI, and two on the
implementation in Open MPI. All can be found at the link below:
   http://osl.iu.edu/publications/Keyword/CHECKPOINTRESTART.php

Best,
Josh

>
> Regards.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/