LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian W. Barrett (brbarret_at_[hidden])
Date: 2008-03-17 13:04:59


On Fri, 14 Mar 2008, Blaise-Omer.Yenke_at_[hidden] wrote:

> I'd like to mesure the synchronisation time for the checkpoint of an MPI
> job. To do so, I'd like to know the data structure of the bookmark
> exchanged among the job's processes before they are individually
> checkpointed. I'd also like to know if the bookmark's size is fix what
> ever the size of the job and it's number of processes.

The data structure sent between peers for bookmarking is a simple
structure containing a couple of fields (bytes / messages in flight,
mainly). It's less than 32 bytes, so it's pretty small. I don't remember
the exact details, but the code is available on our web page. The
bookmark structure's size is fixed -- it does not scale with number of
processes in the job. However, the total amount of data sent to setup a
checkpoint does scale with the number of nodes (as there are more
instances of the bookmark structure being sent around).

Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI Developer
   Make today a LAM/MPI day!