LAM/MPI logo

LAM/MPI Development Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-01-10 09:05:07


On Jan 9, 2006, at 8:29 PM, Liu Xuezhao wrote:

>> BTW: in lam/mpi source code, what "MPI_COMM_SPAWN'ed processes"
>> means?
> I think the MPI_COMM_SPAWN is correspond to the dynamic process
> produce of MPI-2 standard. In MPI-2, there are a new API -
> MPI_COMM_SPAWN, it can spawn several new processes to form a new
> process group.
> It seems that LAM-MPI do not support the MPI_COMM_SPAWN'ed
> processes to be checkpointable. But i am not sure, I am not
> familiar with it, ;)

Correct -- we do not support the MPI-2 dynamic functions (such as
MPI_COMM_SPAWN) with LAM's checkpoint/restart. There are many race
conditions and a general question of "what, exactly, should we
checkpoint? All connected processes?" (per the MPI-2 definition of
"connected") We started working on Open MPI before we could fully
answer these questions in LAM.

However, now that Open MPI is stable, a student is actively working
on checkpoint/restart support in Open MPI. We hope to have LAM-like
BLCR support ready in the next few months. Then we'll resume the
real work, such as process migration, figuring out what to do with
MPI-2 dynamics, etc.

--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/