LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: David Cronk (cronk_at_[hidden])
Date: 2005-02-16 15:02:37


I have a real-time application that uses MPI to send work to various
modules, or stages. This is an extension of a master/slave module.
Here is synopsis:

Module A has one or more children (module B) and sends work to these
children. Module B has one or more children (module C) and sends work
to these children. Etc. The load is varying and we wish to only use
processors when needed. Our approach is to use dynamic process
management. When there is a lot of work for module A it may spawn
additional module Bs. Additionally, when a module B has too much work
it may spawn additional module Cs. Etc. When a module starts to
catch-up it may terminate some of its children. For reasons of
efficiency (no collective communication) module Bs do not communicate
with each other, module Cs do not communicate with each other, etc.
Also, modules may be terminated in a random order. i.e., we may spawn
module b1, b2, b3, b4, b5, and b6, in that order, but we may terminate
b3 then b1, then b5. Finally, to avoid collective communication we
spawn one process at a time. That is, even if we wish to spawn 5
processes, we do this in a loop one at a time. We track processes based
on the inter-communicator returned from the spawning call.

Our problem is controlling where processes are spawned. If we use LAM's
default case, all processes will be spawned on the same processor. This
obviously is no good. If we use "not root" scheduling all processes
will be spawned on just 2 processors (the first 2 listed in the boot
schema). Again, no good. We cannot use a file to define what nodes to
spawn on because we don't know priori what the load is going to be and
thus don't know what processors will be busy. That only leaves round
robin scheduling. However, since we are trying to avoid any collective
communication, a spawning process has no way of knowing how many
processes have been spawned and thus has no way of knowing where to
start the round robin scheduling (which amounts to specifying a node
since we only spawn a single process at a time).

I have an idea on a work-around using round-robin, but it is ugly, not
very portable, and does not make the best use of the available
resources. Basically we can look at this as a tree with module A the
root of the tree, module Bs will be nodes at level 2, module C nodes at
level 3, etc. In other words, Module Bs will be roots of branches from
module A, module Cs roots of branches off module Bs, etc. Module A will
know which nodes can be used for module Bs and keep track of which nodes
are busy. Additionally, it can determine which nodes can be used for
branches and pass this info when spawning a module B. Module B will
then do the same with regard to module C, etc.

I think this will work but it is ugly and I would like to figure out a
better way to this.

Any ideas?

Thanks,
Dave.

-- 
Dr. David Cronk, Ph.D.                      phone: (865) 974-3735
Research Leader                             fax: (865) 974-8296
Innovative Computing Lab                    http://www.cs.utk.edu/~cronk
University of Tennessee, Knoxville