LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-05-19 18:29:22


On Thu, 15 May 2003, Pak, Anne O wrote:

> [snipped]
> so far, i've only managed to get the following working:
> 1. matlab script checks for a published name
> 2. it spawns a master (using MPI_Comm_spawn)
> 3. In the spawned code, the master publishes a name (lobelia)
>
> questions:
> 1. i am using mpi_comm_spawn to spawn this independent master process,
> but mpi_comm_spawn doesn't seem to like it when i specify a maxproc < 2,
> which is why i have
>
> MPI_Comm_spawn("/home/Galadriel/matlab/anne/uplink/update_master3",
> MPI_ARGV_NULL,
> 2, MPI_INFO_NULL, 0, MPI_COMM_SELF,
> &client,MPI_ERRCODES_IGNORE);
>
> in my code, where i spawn off 2 processes. 'lobelia' is the one i treat
> as master out of the two spawned. is there another way of spawning only
> one process?

There shouldn't be a problem with spawning less than 2 processes -- what
exactly happens when you spawn 1?

> 2. because my update_master3 code exits upon completion, when i run
> MEX_master.c a second time, with the intention of just connecting to
> 'lobelia' instead of having to spawn it again, MATLAB script does see
> the published name 'lobelia', but because i exited update_master3.c upon
> completion of the previous invokation, update_master3.c is no longer
> running on 'lobelia'. how can i go about keeping 'lobelia' alive so
> that in subsequent calls to MEX_master.c, it can still connect to
> 'lobelia'. do i just not exit the function update_master3.c? does that
> meani should put in a loop somewhere? not clear on how to implement
> this!!! please help!

Yes -- your master3 program will need to loop instead of exit. Hence, it
will become an "event driven" program, and the matlab script tells it what
to do. I would suggest that your slaves use the same model -- it would be
a lot more efficient to do this rather than re-spawn everything every time
(spawning is not a fast process).

So the main loop of master3 would be something like (typed off the top
of my head -- pardon typos):

  /* Publish the name */
  MPI_Publish_name(...);

  /* Spawn all the slaves */
  MPI_Comm_spawn(...);

  while (1) {
    /* Get a connection from the master */
    MPI_Comm_accept(...);

    /* Receive the commands from the matlab script */
    MPI_Recv(&command, 1, MPI_INT, 0, COMMAND_TAG, accept_comm);
    if (command == DIE_COMMAND) {
      send_slave_die_message();
      break;
    } else if (command == INITIAL_DATA_COMMAND) {
      receive_data_from_master();
      scatter_data_to_slaves();
      start_slaves_working();
      receive_slave_outputs();
      send_outputs_to_matlab();
    } else if (command == UPDATE_DATA_COMMAND) {
      receive_updates_from_master();
      scatter_updates_to_slaves();
      start_slaves_working();
      receive_slave_outputs();
      send_outputs_to_matlab();
    } else {
      printf("Master got unrecognized command! (%d)\n", command);
      MPI_Abort(MPI_COMM_WORLD, 0);
    }

    /* Disconnect from the matlab script */
    MPI_Comm_disconnect(...);
  }
  MPI_Unpublish_name(...);
  MPI_Finalize();

It'll probably need to be a little more elaborate than that (indeed, I
made some assumptions about how Matlab works, etc.), but you get the
basic idea.

>
> 3. how do i have MEX_master.c tell 'lobelia' to PLEASE DIE NOW?

Connect and send it a message with the command == DIE_COMMAND. This
will cause the logic above to quit the loop, unpublish the name, and
call MPI_Finalize to quit an in orderly fashion.

Your slave code will operate on the same principle, but you won't need
to connect/accept to them all the time. You can see above that the
Master will spawn all the slaves at once. They'll have a similar
while(1) loop to receive commands from the master. Probably 4 main
commands:

- receive scatter of initial A values
- receive updates of A values
- perform an iteration and send the results to the master
- time to die

Again, these are general suggestions -- I'm sure that you'll have to
tweak these ideas to fit your specific requirements, but I think it
should be a good enough general framework to let you do what you want.

The whole point here is that since Matlab doesn't play nicely with MPI
(i.e., it unloads your C code when the function finishes), you can't
have any persistent state in your Matlab C code. Hence, you have to
spawn a standalone master (who, in turn, spawns a set of persistent
slaves) that persists through the entire matlab job, even though
Matlab unloads your C code after every call (that's an assumption, but
that's what it sounds like Matlab is doing).

Hope that helps...

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/