On Aug 29, 2005, at 11:49 AM, Troy Telford wrote:
>> I need to know that why IB has not implemented
>> Checkpoint/Restart ? Is it possible to implement
>> Checkpoint/Restart in IB ?
>
> I'll make a bit of a wild guess here: From reading previous posts, IB
> support in LAM is not very complete. I'm sure there are some fairly
> good
> reasons for this; I suspect one of them is the state of the IB drivers
> is
> not particularly stable (meaning they are rapidly changing, and hitting
> the moving target is difficult).
>
> Then there's OpenMPI, which I believe is the intended sucessor to
> LAM...
This is the main reason. Shortly after we started working on IB in
seriousness for LAM, we started working on Open MPI, which pretty much
made IB support in LAM be obsolete (and not worth continuing).
> I'm fairly confident that it's possible to implement
> Checkpoint/Restart in
> IB; it just hasn't been done yet.
That is also correct. Any of the RPI's in LAM *can* be checkpointed,
it's just a matter of writing the code to do so.
As of this point, LAM is pretty much in a maintenance mode (7.1.2 will
escape someday; I swear it!) -- we're doing bug fixes and whatnot, but
not really any new functionality.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|