LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2005-12-24 14:23:04


On Dec 13, 2005, at 10:14 PM, Liu Xuezhao wrote:

First, sorry about the slow reply. The middle of December is the end
of our academic year and things are always a bit crazy :).

> These days i do some experience on the fault-tolerance with LAM-
> MPI and BLCR. I found that it can work good usually, I killed the
> MPI program through crtl+c, and it can be restarted by "cr_restart
> context.xxxx".
> But i found that a MPI application can't be restart if the LAM RTE
> (Run Time Environment) is restart also. I tested like this:

<snip>

> When executing lamhalt and lamboot, the LAM RTE is reboot, and
> the "/tmp/lam-xxx_at_node01" directory is reestablished. And then
> cr_restart is executed, the BLCR module need to reopen the file "/
> tmp/lam-xxx_at_node01/lam-crtcp-rank-1.txt", BLCR can't find it and
> failed to restart the MPI program.

Did you happen to configure LAM/MPI with the --with-debug option?
That file will be created by LAM/MPI if the --with-debug option is
enabled in order to allow the LAM developers to more easily debug
issues with the crtcp rpi component. Under ordinary circumstances,
this file should not be created. Unless you are trying to debug the
internals of LAM/MPI, I would not recommend compiling LAM with the --
with-debug option -- it has a significant performance hit. If you
just want debugging symbols in LAM/MPI, compile with the CFLAGS=-g
option.

You can look at the top of the config.log file in the LAM/MPI source
tree to see what configure options you passed when building LAM/MPI.
If you think you didn't configure with the --with-debug option, could
you send me the following files (relative to the top of the source
directory):

   config.log
   share/include/lam_config.h
   share/ssi/rpi/crtcp/src/lam-ssi-rpi-crtcp-config.h

Thanks!

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/