LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2008-03-28 23:05:18


On Mar 19, 2008, at 7:16 AM, gauri dhopavkar wrote:
> I am doing Post Graduation in Computer Science. My project topic is
> Grid computing. I have built up a Lam/Mpi cluster in our college lab
> which allows parallel execution of job. Please answer these queries:
>
> 1. Are there any standard ready-made applications which can be run
> on this cluster for demonstration purpose?
>

There are many... Have a look in the examples/ directory of the LAM/
MPI tarball for simple ones, or a quick google search should find a
good set of MPI applications.

> 2. Is there any mechnism which allows to store process
> status(details) executing on one node of cluster to other node? this
> is needed in case of node failure.
>
This is not part of the MPI standard. Generally applications use
custom checkpointing mechanisms or system level checkpointing for
handling node failures. LAM/MPI supports integration with the BLCR
system level checkpointer on Linux systems. Have a look at our paper
on the subject for more details:

   http://www.lam-mpi.org/papers/lacsi2003/lacsi-2003.pdf
> 3. How to take snapshot of a process state?
>

This is a difficult task. I'd recommend the above paper for more
details on system level checkpointing. If you search on ACM or IEEE's
database, I'm sure you'll find a number of papers on system and
application level checkpointing for MPI. It's a complex topic with
lots of tradeoffs.

Hope this helps,

Brian

-- 
   Brian Barrett
   LAM/MPI Developer
   Make today a LAM/MPI day!