LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Bogdan Costescu (bogdan.costescu_at_[hidden])
Date: 2004-03-22 10:49:14


On Mon, 22 Mar 2004, Karl Forner wrote:

> and moreover, it's difficult to predict if the job will be short or
> not, it depends on the input, the parameters and so on...

But I guess that most of them are short, otherwise you wouldn't have
asked... In our environment, people often make mistakes in the input
files that they provide to the parallel program which then decides to
finish within a few seconds as well. But this does not matter that
much, as the common case is still the one where the job runs for
several days.

> I have a grid-like system (like globus, pbs, grid engine ...) on top
> of it that takes care of that.

With a smart grid-like system, LAM booting shouldn't take that long.
For example, the TM API of *PBS is pretty efficient in this respect.

SGE offers a solution for the "many short jobs" problem: using the
task arrays, such that the setup of the job is done once (allocation
of the nodes, running the Parallel Environment setup script, etc.) and
all jobs run then in this environment. You can probably have some kind
of spooler that combines more jobs into such a task array.

> it's transparent

Transparency and efficiency are usually opposite goals...

-- 
Bogdan Costescu
IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen
Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY
Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868
E-mail: Bogdan.Costescu_at_[hidden]