Bogdan Costescu wrote:
>On Mon, 22 Mar 2004, Karl Forner wrote:
>
>
>
>>and moreover, it's difficult to predict if the job will be short or
>>not, it depends on the input, the parameters and so on...
>>
>>
>
>But I guess that most of them are short, otherwise you wouldn't have
>asked... In our environment, people often make mistakes in the input
>files that they provide to the parallel program which then decides to
>finish within a few seconds as well. But this does not matter that
>much, as the common case is still the one where the job runs for
>several days.
>
>
>
>>I have a grid-like system (like globus, pbs, grid engine ...) on top
>>of it that takes care of that.
>>
>>
>
>With a smart grid-like system, LAM booting shouldn't take that long.
>For example, the TM API of *PBS is pretty efficient in this respect.
>
>SGE offers a solution for the "many short jobs" problem: using the
>task arrays, such that the setup of the job is done once (allocation
>of the nodes, running the Parallel Environment setup script, etc.) and
>all jobs run then in this environment. You can probably have some kind
>of spooler that combines more jobs into such a task array.
>
>
>
>>it's transparent
>>
>>
>
>Transparency and efficiency are usually opposite goals...
>
>
>
my point is that my system works great : it is efficient, transparent
and interactive, the problem is that
sometimes the lam session gett messed because in some cases (crashes or
interruptions) some files remain open by the
lam daemon on the master node.
I can know which lam job has had some problems, I just need some way to
clean it without affecting the other tasks.
Or to find a way to quickly start a new lam session.
Question : is it feasible to use a lam-session to start a new one ?
Karl
|