LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: John Paul Walters (jwalters_at_[hidden])
Date: 2006-04-04 18:31:11


        I have a couple of questions regarding the LAM "self" checkpoint
        module. The first problem that concerns me is a crash that
        occurs whenever a lamcheckpoint request is issued. The segfault
        occurs in ssi_crlam_self.c, when create_app_schema calls
        free(tmp_as) (3rd from the last statement within
        create_app_schema). I've gone as far as to remove the calls to
        the checkpointing library that I've provided to avoid any
        possibility that my library is interfering, and replacing the
        calls with simple printfs. Could this be a bug in the self
        checkpoint module?
        
        Also, what functionality should my checkpointing library provide
        with respect to restarting the checkpointed MPI job? Does my
        checkpoint/restart library need to make a call to MPI_Init()
        upon restart? Short of that, how else can I reinitialize the
        communication channels?
        
        Thanks,
        JP