On Mar 19, 2008, at 7:16 AM, gauri dhopavkar wrote:
> I am doing Post Graduation in Computer Science. My project topic is
> Grid computing. I have built up a Lam/Mpi cluster in our college lab
> which allows parallel execution of job. Please answer these queries:
>
> 1. Are there any standard ready-made applications which can be run
> on this cluster for demonstration purpose?
>
There are many... Have a look in the examples/ directory of the LAM/
MPI tarball for simple ones, or a quick google search should find a
good set of MPI applications.
> 2. Is there any mechnism which allows to store process
> status(details) executing on one node of cluster to other node? this
> is needed in case of node failure.
>
This is not part of the MPI standard. Generally applications use
custom checkpointing mechanisms or system level checkpointing for
handling node failures. LAM/MPI supports integration with the BLCR
system level checkpointer on Linux systems. Have a look at our paper
on the subject for more details:
http://www.lam-mpi.org/papers/lacsi2003/lacsi-2003.pdf
> 3. How to take snapshot of a process state?
>
This is a difficult task. I'd recommend the above paper for more
details on system level checkpointing. If you search on ACM or IEEE's
database, I'm sure you'll find a number of papers on system and
application level checkpointing for MPI. It's a complex topic with
lots of tradeoffs.
Hope this helps,
Brian
--
Brian Barrett
LAM/MPI Developer
Make today a LAM/MPI day!
|