I was using lam-7.0.6 as I assume 7.1.1 is still having that problem
with blcr.
I tried again with lam-7.1.2b8. The cr_restart problem is still there.
But this time cr_checkpoint generates about 8.5MB checkpoint files for
every MPI processes no matter what simple program I use.
Here are the configure options:
BLCR: --prefix=/usr/local/blcr --with-lam=/usr/local/mpi/lam-blcr
LAM: --prefix=/usr/local/mpi/lam-blcr
--with-cr-blcr=/usr/local/blcr --with-rpi=crtcp
My distro is Slackware 10, stock 2.4.26 kernel. I also tried BLCR
without --with-lam, the result is the same. Only $PREFIX/lib/libcr.* are
different.
However blcr works fine with ordinary programs.
Jeff Squyres wrote:
> Yes, I tried it before it was released (we work together on such
> things). I actually don't know what the --with-lam option does, but I
> wasn't able to get a hang using cr_restart. Are you able to
> checkpoint / restart simple MPI applications (e.g., hello world with a
> big sleep() in the middle?). What version of LAM are you using? Have
> you tried the latest LAM beta (7.1.2b8)?
>
> On Nov 14, 2004, at 5:08 AM, Neville Lee wrote:
>
>> Has anybody tried the new blcr-0.3.0 with LAM? It has an additional
>> configure option --with-lam=DIR. The kernel module cr.o is renamed to
>> blcr.o.
>> cr_checkpoint works fine. However cr_restart stucks when trying to
>> restart from a checkpoint.
>> Any idea?
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
|