LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Karl Forner (Karl.Forner_at_[hidden])
Date: 2003-04-15 12:47:08


Phil Ehrens wrote:

>Karl Forner wrote:
>
>
>>I have set-up a batch-system that seems similar to yours, and in my case
>>the failure with the persistent lamds is tied to two bugs :
>>- mpirun does not kill all the processes if a LAM application fails, so
>>some files remain open
>>- some files stay open too if the job is interrupted by, for instance, a
>>SIGINT (CTRL+C)
>>
>>so I found a kind of work-around that you can find in the mailing-list :
>>"LAM: Re: mpirun (set_stdio): Too many open files in system"
>>
>>
>>
>
>Hi Karl,
>
>We dealt with that one quite a while ago. We found that we needed
>to go on a killing spree (ssh/rsh kill -9 pid) if lamclean does
>not return within 20 seconds. This is a last resort that we seldom
>need to exercise, however, since lamclean seems to do the right
>thing most of the time.
>
>
>
but how may you use 'lamclean' without affecting the other jobs running
using your same persistent set of lamds ?
maybe you don't have simultaneous jobs ?

Karl