Hi, LAM team,
It has happened to us quite a few times when the lamd hanging in
kqsync() and is taking > 99% of CPU time seen from "top" on linux.
We don't know exactly how this situation started, but only notice that
some jobs are running extremely slow, then we notice the lamd is
taking more than 99% of the CPU time and is hanging in kqsync().
Here is the stack trace for lamd from gdb.
----------------------------------------------------------
0x08052ce1 in kqsync ()
(gdb) where
#0 0x08052ce1 in kqsync ()
#1 0x0805378c in main ()
#2 0x40053657 in __libc_start_main (main=0x8053570 <main>, argc=10,
ubp_av=0xbffff3c4, init=0x8049540 <_init>, fini=0x805c9d0 <_fini>,
rtld_fini=0x4000dcd4 <_dl_fini>,
stack_end=0xbffff3bc) at ../sysdeps/generic/libc-start.c:129
(gdb) quit
-----------------------------------------------------------------
Our system is using Linux Redhat 7.2 with gcc 2.93.5.
Please help.
Thank you.
Lily
|