LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Layton (jeffrey.b.layton_at_[hidden])
Date: 2004-10-15 13:57:21


William Bierman wrote:

>>The kind of fail-over system you're describing is really one where every
>>node can be a master, which basically means that they're equivalent in
>>capability. The "master" is nominated by an election based on some sort
>>of rules (i.e. first to boot, fastest network, whatever). You could
>>possibly do this with LAM by using shell scripts to configure the
>>election winner as the master, say by NFS, and from there configure the
>>slaves. The scripts would be pretty complex though. As Bogdan says,
>>you'll need a hot-swap master as well in case the master comes down. If
>>a slave dies, you'd have to bring down the LAM universe and bring it
>>back up minus the dead node. More scripting. It's doable, but it's a
>>lot of work.
>>
>>
>
>Yes, this is exactly my goal. And yes, it is quite a lot of work.
>That is why I wanted to avoid re-inventing the wheel as much as
>possible. Doing it all from scratch does has some benefits though.
>
>Does anyone know what work has been done to show whether or not LAM
>(or any MPI suite) can adapt a running process without the process
>being aware of it, should a node drop out? This should theoretically
>be possible, especially if you're willing to use some sort of a
>kludge.. say just restart the process doubling up on a processor, and
>re-assign the dropped out node's 'id'. There are still of course
>issues with that (the memory that was lost, etc.) .. but I am by no
>means an expert in this field.
>

   Well, I'm not an expert either :) However, you might think of Jeff's
new project: OpenMPI (www.open-mpi.org). It incorporates FT-MPI
from the University of Tennesse (http://icl.cs.utk.edu/ftmpi/). I've never
used it, but here's a quote from their front page,

"FT-MPI survives the crash of n-1 processes in a n-process job, and, if
required,
can respawn/restart them. However, it is still the responsibility of the
application
to recover the data-structures and the data on the crahsed processes."

I'm not sure this meets your requirements since it says that you have to
recover the data-structures yourself. One other option is to HA each node
so that if you lose a node, the other one picks up. I don't know how well
this works with MPI though.
   With MPI-2 you can dynamically add and subtract nodes. This might
buy you something depending upon what you ware doing.
   ICL also has a project called HARNESS (http://icl.cs.utk.edu/harness/)
that you might be interested in.
   I've followed this thread sort of half heartedly, but let me ask a
question.
Why are you interested in so much redundancy in a cluster? (I'm not being
accusitory just curious). I've run MPI jobs for several weeks without any
problems. I've also had cluster stay up without any failure for almost 12
months - and that was running 24/7. Do you have an app that needs to run
an extremely long time? Have you looked at checkpointing your code?
(Again, just curious).

Jeff

-- 
Dr. Jeff Layton
Aerodynamics and CFD
Lockheed-Martin Aeronautical Company - Marietta