LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-05-13 13:10:40


On May 12, 2005, at 9:55 PM, Prakash Velayutham wrote:

> Thanks for the response. In this case, the entire set of nodes that are
> in the LAM universe will not be assigned any other jobs by the
> scheduler
> (even if they are actually not running any MPI jobs), correct? I want
> the scheduler-MPI interaction to be more dynamic than that. This way I
> could ask for more nodes in between iterations (or several iterations)
> and if the scheduler has gotten some nodes from other completed jobs
> (and there are no other queued jobs (based on priorities and other
> scheduling policies), it could assign them to me and I could run the
> MPI
> job in more nodes than I started with. After after some iterations, I
> could give back some nodes when the MPI job I am running does not need
> so many.

David C. is right -- you need complex interactions with your job
scheduler to do this. LAM already has the capability to expand and
shrink its universe (and you could certainly MPI_COMM_SPAWN onto the
nodes that were newly added to the universe), but first and foremost,
you have to get more resources.

Also, don't forget that allocating/claiming resources, expanding a
universe, and spawning onto them (and then doing the inverse to release
resources) is a somewhat expensive operation. It's not really
something that you'd want to do often.

> Can anyone explain to me what can let me do this? If this is currently
> not possible, how much effort would it be for me to modify the LAM
> sources to do this? That could be best determined by one of the
> developers, I guess. The point that I am wondering about is that if
> this
> cannot be supported, then what is the point in MPI-2 dynamic process
> management in the context of a resource manager? Wondering ...!!!

That is a question that no one has been able to answer. :-)

IBM, for example, doesn't implement MPI_COMM_SPAWN on their Big Iron
machines for this very reason -- since their machines are designed to
be full all the time (i.e., there are never any free nodes), there's no
point. (specifically: if you have enough $$$ to buy Big Iron IBM, you
likely have enough jobs to keep it busy 100% of the time ;-) ).

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/