On Wed, 4 Sep 2002, Brownlow, Charles wrote:
> I've got Lam -mpi working well on a 8 node Linux Cluster,
>
> Now if I want to run more than one job across the cluster,
>
> I believe I need either a job scheduling tool or load mgt tool.
>
> Can anyone recommend one to me, preferably an easy one that users can use.!
There are a couple of solutions. Ok, tons. But I only have real
experience with a couple, so here goes...
* Whiteboard scheduling:
This works best for a small group of people who tend to need a fixed
number of nodes for long periods of time. Basically, have a
{whiteboard, web page, piece of paper} where people "sign out" nodes.
You just trust users not to use nodes that they haven't signed out. And
trust other users to only "sign out" nodes when they need them.
Works great for small groups. Low setup time and cost. Doesn't scale
in any way. Up until recently, our 8 node cluster was run this way. A
motd entry on the head node let people know who had what. But we only
had 4 users...
* PBS / OpenPBS
This is probably the traditional solution for "Beowulf" machines. There
is a commercial version (yeah, support!) and a free version. The free
version usually lags the commercial version in features, but generally
meets the needs of anyone who has a small cluster. We use PBS for our
clusters. Setting it up is a pain (IMHO), but there are worse things in
a sys-admin's life.
* LSF / others
There are many very not-free solutions out there. All have their
strengths and weeknesses. LSF is probably the best example, although
I've never used it. LSF is very, very not free and probably completely
overkill for 8 nodes.
You might want to post to the Beowulf mailing list if you have more
questions. While you might start a good religious debate or two, you will
probably get some good advice along the way.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|