Hi -
Sorry, but what you are asking for isn't possible. The MPI standard
specifically says that you can not call Init twice in the same process,
which is what you want to do.
That being said, you should really take a look at the built-in
checkpoint/restart capabilities of LAM/MPI. Using the BLCR package
from Berkeley Labs, it is possible to checkpoint/restart a process on
Linux without any user intervention. In the upcoming LAM 7.1 release,
we will have the ability for the user to provide his own hooks for
doing the checkpoints in user space, which should give you the
flexibility you are looking for.
Hope this helps,
Brian
On May 28, 2004, at 7:30 AM, Franiu wrote:
> Hi!
>
> Since I am new to this group, I'd like to welcome all the members :-)
>
> I am currently developing a solution for checkpointing/restarting
> (possibly migrating) MPI-enabled applications. I've decided to use a
> user-level checkpointing library ckpt to freeze the process state.
> My thought was, that after being given a specific signal, the
> application (meaning all the distributed processes) chooses a safe
> point, excluding all the risks of loosing travelling messages, and
> performs a distributed checkpoint of each process separately, preceded
> by calling MPI_Finalize. After the restart of each node, the processes
> are aware of being restarted (that includes several modifications of
> the execution environment), and try to call MPI_Init for re-creation
> of MPI communicator and ranks. The problem is, that although the
> processes are in fact new (different PIDs), LAM/MPI still recognizes
> that MPI_Finalize had been called and refuses the creation of a new
> MPI world.
>
> My question is: is there any way of changing this behaviour? Is it
> possible to tell the MPI routines and structures to set themselves up
> from the scratch and make it possible to call MPI_Init again?
>
> I should probably add, that such a solution is needed to achieve some
> level of transparency of the checkpointing mechanism to the
> programmer.
>
> --
> Greeting,
> Marcin FrÄ
czak mailto:marcin.f_at_[hidden]
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|