Hmm.
This shouldn't happen -- SLURM should be killing this lamd. Let me try
to reproduce on my cluster and see what happens...
On Feb 4, 2005, at 2:26 AM, Bryan O'Sullivan wrote:
> Now that I have lamboot not crashing with slurm 0.3.10, I find that the
> nightly SVN lamboot causes srun to hang in the following scenario:
>
> $ srun -n 2 -A
> $ lamboot
> $ exit
> $ exit
> srun: error: eng-24: task0: Killed
> srun: Terminating job
>
> This leaves behind a lamd process, which is stuck here:
>
> #0 0x0000003c484be455 in __select_nocancel () from
> /lib64/tls/libc.so.6
> #1 0x000000000040cf52 in kio_req ()
> at ../../../../otb/sys/kernel/kernelio.c:331
> #2 0x000000000040e297 in run_kernel (argc=1,
> argv=0x7fbffff308)
> at ../../../../otb/sys/kernel/kouter.c:176
> #3 0x0000000000404a9a in main (argc=1, argv=0x7fbffff308)
> at ../../../../otb/sys/lamd/lamd_main.c:105
>
> More details tomorrow.
>
> <b
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|