Phil Ehrens wrote:
>Karl Forner wrote:
>
>
>>Phil Ehrens wrote:
>>
>>
>>
>>>Don't create a seperate lam user for each user, but DO create
>>>more than one. If one account suffices when no failures occur,
>>>and if failures costs you 50% of your duty cycle, have two users,
>>>and so forth, using them in round-robin fashion. That's what we
>>>do, and we achieve job rates of > 2000 jobs per hour over weeks
>>>of running.
>>>
>>>
>>>
>>>
>>I tried once a workaround like this, using the session prefix feature,
>>but sometimes the lamboot command hung, and it became hard to debug and
>>maintain.
>>
>>
>
>I am not using the session prefix feature, and this is not a workaround.
>This is the normal way of operating a high performance batch processing
>environment... let the operating system figure out what belongs to which
>job and you don't have to.
>
>I never experience lamboot hangs, and, as I say, we are getting continuous
>job throughput of >2000 jobs/hr with typically 6-8 users active, so that
>is as many users as you would ever have to create in all likelihood.
>
>
>
seems interesting. How do you switch from one lam user to the next one ?
I have to be able to delete some jobs even if they are running, do I
have to do it under the 'root' account ?
So to sum up, are you running your system as root, then issuing a 'su'
command to switch to the appropriate user ?
>>If I has to spend so more time, I'd prefer improving LAM than my private
>>software, kind of deserved contribution :-)
>>
>>
>
>Noble, but lam is not a batch processing environment.
>
>
I totally agree it is not a batch processing environment, but you have
the concept of session that seems to be conceived to be able to run more
than one single job : if you were to only run a single job, you did not
need lam daemons able to execute processes.
But my usage is not intented as a traditional batch system : users
have their output "in real time", as if they were running a program on
their local computer. It's not intented to batch at once one thousands
jobs, then at some time check if the jobs are done.
LAM It's not far from what I really need, just a few misfeatures...
My problems ususally occur if too many users cancel interactively their
jobs by typing 'CTRL+C'.
Otherwise, when the lam jobs are runned via an automatic pipeline, it's ok.
|