Hi --
My apologies for the very very late reply to this question!
Well, I was able to replicate this with 7.0.4 when trying to do CTRL-C
twice in quick succession. This seems to be taken care of in our
repository development version. It probably would not go into our 7.0.5
release due shortly, but will make into the 7.1 release (which may take
a month's time). If this problem is not severe for you and you can wait
until then, things are fine. Else you can get an anonymous checkout of our
repository version or a latest nightly tarball (check
http://www.lam-mpi.org/svn/). Warning: This version can be unstable since
it is still under production.
-Vishal
On Fri, 19 Mar 2004, Karl Forner wrote:
# Hello,
#
# I've been using LAM on production on two clusters for years, and there's
# a very annoying bug that is still present even
# in the last version.
#
# When you kill a lam job, by example by typing 'CTRL+C' in the terminal,
# some files stay open by the lam daemon.
# Then the number of open files reach 71, and at this point, you can not
# any longer launch new jobs, you get an error message like :
#
# lamexec (set_stdio): Too many open files in system
#
# It is easy to reproduce : for example on a linux cluster, with redhat
# 7.2 running lam 7.0.4.
#
# % lamboot -b -v
#
# the get the pid of the lam daemon : e.g
# % PID=`pgrep lamd -u $USER`
#
# then count the number of open files (plus one) :
# % ls -l /proc/$PID/fd | wc -l
# you should have 11 open files
#
# then repeat the following process
#
# launch a simple lam command
# % lamexec N sleep 10
# and interrupt it with one or two 'CTRL+C'
# you can check with " ls -l /proc/$PID/fd | wc -l" that the number of
# open files is increasing.
#
# repeat it until you reach 71 open files, then you should have the error
# message.
#
# Is this bug already referenced ?
# Do you need some help to fix it ?
#
# Thanks Karl FORNER
#
#
# _______________________________________________
# This list is archived at http://www.lam-mpi.org/MailArchives/lam/
#
|