There's a cool example at
http://www.lam-mpi.org/tutorials/one-step/ezstart.php it has helped me
to get started on mpi and get some great results but I think that I
see a large inefficiency in the shutting down logic. In my
application my "slaves" will complete with widely varying times. I'm
wondering if those more familiar with how MPI deallocates nodes on a
cluster might comment. I am particularly interested in developing a
more demand-aware code and scripting that senses (1) how many cluster
processors are available, using the maximum rather than some
hard-coded limit and (2) how many other users are queued... possibly
even readjusting the number of processors in use during a run. If
someone has examples like that, point me to them :-)
What is see in the below example, as posted, is a master that waits
for all slaves to complete before killing any of them. I think what
I'd like is to combine both loops, still using the MPI_ANY_SOURCE, but
then sending the DIETAG immediately to that particular slave, that
would be a change from the second loop which kills the slaves in
numerical order. My question to the experts is, will this change free
up the slave so that a different mpi job can start on it? If it's not
enough to do so, then what else would I need to do?
- Raymond
excerpt from the posted example code:
/* There's no more work to be done, so receive all the outstanding
results from the slaves. */
for (rank = 1; rank < ntasks; ++rank) {
MPI_Recv(&result, 1, MPI_DOUBLE, MPI_ANY_SOURCE,
MPI_ANY_TAG, MPI_COMM_WORLD, &status);
}
/* Tell all the slaves to exit by sending an empty message with the
DIETAG. */
for (rank = 1; rank < ntasks; ++rank) {
MPI_Send(0, 0, MPI_INT, rank, DIETAG, MPI_COMM_WORLD);
}
|