LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Raymond M (raymondsgroupmail_at_[hidden])
Date: 2010-12-03 17:02:17


There's a cool example at
http://www.lam-mpi.org/tutorials/one-step/ezstart.php it has helped me
to get started on mpi and get some great results but I think that I
see a large inefficiency in the shutting down logic. In my
application my "slaves" will complete with widely varying times. I'm
wondering if those more familiar with how MPI deallocates nodes on a
cluster might comment. I am particularly interested in developing a
more demand-aware code and scripting that senses (1) how many cluster
processors are available, using the maximum rather than some
hard-coded limit and (2) how many other users are queued... possibly
even readjusting the number of processors in use during a run. If
someone has examples like that, point me to them :-)

What is see in the below example, as posted, is a master that waits
for all slaves to complete before killing any of them. I think what
I'd like is to combine both loops, still using the MPI_ANY_SOURCE, but
then sending the DIETAG immediately to that particular slave, that
would be a change from the second loop which kills the slaves in
numerical order. My question to the experts is, will this change free
up the slave so that a different mpi job can start on it? If it's not
enough to do so, then what else would I need to do?

- Raymond

excerpt from the posted example code:

  /* There's no more work to be done, so receive all the outstanding
     results from the slaves. */

  for (rank = 1; rank < ntasks; ++rank) {
    MPI_Recv(&result, 1, MPI_DOUBLE, MPI_ANY_SOURCE,
             MPI_ANY_TAG, MPI_COMM_WORLD, &status);
  }

  /* Tell all the slaves to exit by sending an empty message with the
     DIETAG. */

  for (rank = 1; rank < ntasks; ++rank) {
    MPI_Send(0, 0, MPI_INT, rank, DIETAG, MPI_COMM_WORLD);
  }