On Fri, 14 Mar 2008, Blaise-Omer.Yenke_at_[hidden] wrote:
> I'd like to mesure the synchronisation time for the checkpoint of an MPI
> job. To do so, I'd like to know the data structure of the bookmark
> exchanged among the job's processes before they are individually
> checkpointed. I'd also like to know if the bookmark's size is fix what
> ever the size of the job and it's number of processes.
The data structure sent between peers for bookmarking is a simple
structure containing a couple of fields (bytes / messages in flight,
mainly). It's less than 32 bytes, so it's pretty small. I don't remember
the exact details, but the code is available on our web page. The
bookmark structure's size is fixed -- it does not scale with number of
processes in the job. However, the total amount of data sent to setup a
checkpoint does scale with the number of nodes (as there are more
instances of the bookmark structure being sent around).
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|