LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2006-04-05 22:22:44


JP,

I have been taking a look at the 'self' module due to the questions
that have been asked on the list lately. Unfortunately I believe I
have discovered a bug or two with that module. I'm taking a look at
it now, and will reply back with more details and information.

Which version of LAM/MPI are you using in which you encountered the
problem that you highlighted?

Sorry I can't be much more help at the moment, but I'll post back soon.

Cheers,
Josh

On Apr 4, 2006, at 6:31 PM, John Paul Walters wrote:

>
> I have a couple of questions regarding the LAM "self"
> checkpoint
> module. The first problem that concerns me is a crash that
> occurs whenever a lamcheckpoint request is issued. The
> segfault
> occurs in ssi_crlam_self.c, when create_app_schema calls
> free(tmp_as) (3rd from the last statement within
> create_app_schema). I've gone as far as to remove the
> calls to
> the checkpointing library that I've provided to avoid any
> possibility that my library is interfering, and replacing the
> calls with simple printfs. Could this be a bug in the self
> checkpoint module?
>
> Also, what functionality should my checkpointing library
> provide
> with respect to restarting the checkpointed MPI job? Does my
> checkpoint/restart library need to make a call to MPI_Init()
> upon restart? Short of that, how else can I reinitialize the
> communication channels?
>
> Thanks,
> JP
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/