Josh,
Thanks for the response. I've been using LAM version
7.2b1svn10281.
Regards,
JP
On Wed, 2006-04-05 at 22:22 -0400, Josh Hursey wrote:
> JP,
>
> I have been taking a look at the 'self' module due to the questions
> that have been asked on the list lately. Unfortunately I believe I
> have discovered a bug or two with that module. I'm taking a look at
> it now, and will reply back with more details and information.
>
> Which version of LAM/MPI are you using in which you encountered the
> problem that you highlighted?
>
> Sorry I can't be much more help at the moment, but I'll post back soon.
>
> Cheers,
> Josh
>
> On Apr 4, 2006, at 6:31 PM, John Paul Walters wrote:
>
> >
> > I have a couple of questions regarding the LAM "self"
> > checkpoint
> > module. The first problem that concerns me is a crash that
> > occurs whenever a lamcheckpoint request is issued. The
> > segfault
> > occurs in ssi_crlam_self.c, when create_app_schema calls
> > free(tmp_as) (3rd from the last statement within
> > create_app_schema). I've gone as far as to remove the
> > calls to
> > the checkpointing library that I've provided to avoid any
> > possibility that my library is interfering, and replacing the
> > calls with simple printfs. Could this be a bug in the self
> > checkpoint module?
> >
> > Also, what functionality should my checkpointing library
> > provide
> > with respect to restarting the checkpointed MPI job? Does my
> > checkpoint/restart library need to make a call to MPI_Init()
> > upon restart? Short of that, how else can I reinitialize the
> > communication channels?
> >
> > Thanks,
> > JP
> >
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
> ----
> Josh Hursey
> jjhursey_at_[hidden]
> http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|