LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-08-29 06:35:46


This only "sorta" supported. Note that lamgrow / lamshrink really only
expand / shrink the LAM *univerise*, not a running application. You'll
need to add special stuff into an application to make it aware that the
universe has grown or shunk.

For example, you may want to have one "master" process that
MPI_COMM_SPAWN's a bunch of workers and dynamically gives them work to
do. As a new worker becomes available, the master can be notified
somehow (perhaps via some mechanism outside of MPI, such as a socket,
pipe, file, or other IPC mechanism) and it can spawn a new worker
there. Similarly, when a node becomes unavailable, the master can be
told, and it can send a "please shut yourself down" message to the
worker.

Hope that helps.

On Aug 28, 2004, at 10:18 AM, Yoni Yoni wrote:

> For an 'embarrassingly parallel' problem, I'm using several machines
> running LAM/MPI (7.0.4 if that matters). The machines are scattered
> around and are managed separately.
>
> My current setup is based on the Mandelbrot example.
>
> The main shortcoming of this code is that I can't add and remove
> machines while the program is running.
>
> I would like to have a client/server style setup that can handle
> machines joining and leaving the pool of available machines (i.e.
> using lamgrow and lamshrink) so that the server will assign/reassign
> work to them as well.
>
> Any help / pointers / example will be greatly appreciated.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/