LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-03-20 07:58:52


On Wed, 3 Mar 2004, Gautam wrote:

> we have a simple home-brewn scheduler that we use for research purposes.
> This includes a server application that runs on the head node, and every
> client machine on the cluster runs a client deamon.
>
> I want the abilility to spawn mpi processes from this client deamon.
> Thus each instance of the mpi-program having been spawned from the
> client deamon should be able to then setup the MPI environment and run
> as an MPI application.
>
> More over, i would also like to be able to capture sig_child signals
> when something goes wrong in the MPI application.
>
> is this possible with LAM MPI? would using lamexec be an option? where
> can i get more information to get this accomplished.

Sorry for the delays in response -- mid-terms, spring break, and too many
other projects have been getting in the way of us answering mails on this
list recently. :-\

Yes and no.

Keep in mind that LAM spawns its own daemons and does all of its work
through those daemons. So the MPI applications are actually launched
through the LAM daemons, not your daemons -- you won't be able to catch
sig_child's from the MPI processes.

But in answer to having your run-time environment (RTE) be able to spawn
MPI jobs is yes -- you can write a boot SSI module (which isn't that hard)
to have lamboot talk directly to your RTE to launch the LAM daemons.
mpirun will then talk to the LAM daemons and launch MPI processes -- but
everything will be a descendant of your RTE. Hence, your RTE can reliably
kill everything in the process group at the end of the job.

Does that make sense?

If you're interested in writing a boot SSI module, you might want to move
this line of questioning over to the lam-devel mailing list
(http://www.lam-mpi.org/mailman/listinfo.cgi/lam-devel).

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/