LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-05-14 15:59:03


On Wed, 14 May 2003, Pak, Anne O wrote:

> [snipped]
> I have thought about putting an infinite loop in the slave/spawned
> routine and using MPI_Barrier to synchronize data transfers between the
> master and slave, but the fact that MATLAB/MEX is involved means the
> master routine will have to exit completely and return control back to
> MATLAB, so I am not sure the slave node can keep the MPI connection open
> with the master node if the master node's MEX program exits completely
> to MATLAB.
>
> Does anyone know if the MPI connection is kept alive if MPI_finalize is
> never called?

Yes, it is.

However, it depends on Matlab's invocation model -- do they unload
your shared module after you finish the call? If they unload the
module, then you might have problems.

If you have a huge problem with this, can you alter your model a
little? Thinking off the top of my head -- would something like this
be possible:

- your matlab script launches
- it calls MPI_Init
- check for a published name
- if the published name does not exist
  - spawn a master (i.e., a new, independant process)
  - the master publishes a name
- if the published name does exist
  - MPI_Comm_connect to the master
- the master spawns a bunch of slaves to do the work
- the matlab script sends a bunch of work to the master
- the master farms it out to all the slaves
- the slaves do all the work and eventually send the result(s) to the
  master
- the master sends the result(s) to the matlab script
- the matlab script disconnects from the master
- the matlab script finishes
---> note that the master and all of the slaves are still running

The next time that the matlab script starts up, it sees that the
master is running and just connects to it (rather than spawning a new
one). Hence, all of your slaves are durable and keep their data (no
need to re-scatter the same data every time).

Hence, your architecture is that your matlab script acts as a command
input to the persistent master and its slaves. You attach/detatch to
the master in order to send it a command and get the results.

You'll need some kind of "please die now" command, too, so that when
all processing is done, the matlab script can tell the master to kill
all of its slaves and die (and unpublish the name).

Does that make sense?

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/