LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-07-30 07:45:19


On Tue, 29 Jul 2003, Milan Diebel wrote:

> The way I use lam-mpi is to lamboot a whole bunch of nodes and submit
> several independent mpirun jobs to the booted "lam-cluster". A single
> mpirun job usually uses several nodes.
>
> I am wondering whether it is possible to kill individual mpirun jobs?
> Sometimes it happens that a single job hangs or I like to terminate the
> process for other reasons. In my understanding I can't use lamhalt or
> wipe since other jobs are still running in the lam universe, which I
> don't want to terminate.

That is correct. lamhalt and wipe will destroy the entire LAM universe
(and all jobs running in it). Similarly, lamclean will kill all processes
within a LAM universe (but the universe is still left running).

In this case, the best that you can do is probably sent a ctrl-C (SIGHUP)
to mpirun or lamexec, which should do a "best effort" to kill the
processes that it started. However, if your processes have additional
resources (e.g., MPI-2 published names), they will not be cleaned up by
simply killing mpirun.

You may wish to lamboot separate universes if you need this flexibility.
In 7.0, you can set the LAM_MPI_SESSION_SUFFIX environment variable to a
unique value before lambooting. This gives you an effectively separate
LAM universe, even if some of your lamboots overlap on the same nodes.
See the 7.0 User Guide (look up LAM_MPI_SESSION_SUFFIX in the index) for
more details.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/