Now that I have lamboot not crashing with slurm 0.3.10, I find that the
nightly SVN lamboot causes srun to hang in the following scenario:
$ srun -n 2 -A
$ lamboot
$ exit
$ exit
srun: error: eng-24: task0: Killed
srun: Terminating job
This leaves behind a lamd process, which is stuck here:
#0 0x0000003c484be455 in __select_nocancel () from /lib64/tls/libc.so.6
#1 0x000000000040cf52 in kio_req ()
at ../../../../otb/sys/kernel/kernelio.c:331
#2 0x000000000040e297 in run_kernel (argc=1, argv=0x7fbffff308)
at ../../../../otb/sys/kernel/kouter.c:176
#3 0x0000000000404a9a in main (argc=1, argv=0x7fbffff308)
at ../../../../otb/sys/lamd/lamd_main.c:105
More details tomorrow.
<b
|