LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-10-28 06:14:32


On Oct 27, 2004, at 3:30 PM, Neville Lee wrote:

> I am doing a academic project which relies on dynamic migration of
> MPI process. I think of using BProc as the underlying library. However
> when a process calls bproc_move() to migration, it also breaks the MPI
> communication mechanism. What I need is some library routine to inform
> MPI of the migration of the process.

LAM/MPI has the capability to checkpoint and restart entire parallel
jobs. It probably wouldn't be too hard to checkpoint, migrate, and
restart single processes. We had always planned to add that capability
to LAM (it's mainly some additional bookkeeping on top of what is
already there), but got sidetracked when we started working on Open MPI
last January. It is therefore unlikely that we will do this additional
work in LAM -- instead we will likely [eventually] have this capability
in Open MPI.

> Mr. Dimick at LA-MPI told me that OpenMPI is planning to support
> process migration. Since I can not access their mailing list, I want
> to ask if anybody here knows about their schedule as when will this be
> supported? I also want to ask the feasibility if I were to write a
> library routine as mentioned above.

http://www.open-mpi.org/ lists a users mailing list; were you unable to
sign up on it?

Before answering, let me stress that Open MPI is currently unavailable
to the public. We are feverishly working hard to finish it, but it
won't have a stable release for at least a few more months. Hence, all
my comments here are about unreleased software are speculative and are
subject to change once we finally have a stable release. :-)

We have not yet tackled checkpoint / restart or process migration in
Open MPI. In some ways, because of the groundwork and infrastructure
that has been created to support data fault tolerance in Open MPI,
doing checkpoint / restart and/or migration may be easier in Open MPI
(vs. LAM/MPI). However, I do not anticipate actively working on this
stuff for at least a few more months. As such, you might have better
luck working with LAM/MPI (at least in the short term).

Please note that this is the LAM/MPI mailing list, not an Open MPI
mailing list. While the members of the LAM team are involved in both
projects, we would generally prefer if this list is restricted to
LAM-related questions (or questions that apply to both LAM and Open
MPI).

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/