Hello everyone,
I am trying to use the canonical MPI_Comm_spawn() example from MPI 2 report
(see for example
http://www-unix.mcs.anl.gov/mpi/mpi-standard/mpi-report-2.0/node98.htm )
The environment is: RedHat Linux ES 3, LAM 7.1.2
The problem is that the slaves are spawned fine but they seem to hang in
MPI_Init() while the master is still in MPI_Comm_spawn(). I've done some
googling but it didn't yield any usable results. Is there something I'm
missing?
Both programs are attached below. The only difference over the code in the
URL above is changed diagnostics.
Thanks in advance,
-- Alexander L. Belikoff
P.S. In case this is read by the person responsible for the mailing list
management: mailing list search is broken and doesn't yield any results.
Searching using Google over the site does work fine but all documents show
up with the same title: "LAM/MPI General User's Mailing List Archives." This
is very easy to fix: the code that generates a PHP page for each message
should just set the title to the actual subject line of the message instead
of the generic title.
============================================================================
/* manager */
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#define error(s) { fprintf(stderr, "FATAL: " s "\n"); exit(1); }
int main(int argc, char *argv[])
{
int world_size, universe_size, *universe_sizep, flag;
MPI_Comm everyone; /* intercommunicator */
char worker_program[100] = "mpi_worker";
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
if (world_size != 1) error("Top heavy with management");
MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE,
&universe_sizep, &flag);
if (!flag) {
printf("This MPI does not support UNIVERSE_SIZE. How many\n\
processes total?");
scanf("%d", &universe_size);
} else universe_size = *universe_sizep;
if (universe_size == 1) error("No room to start workers");
/*
* Now spawn the workers. Note that there is a run-time determination
* of what type of worker to spawn, and presumably this calculation must
* be done at run time and cannot be calculated before starting
* the program. If everything is known when the application is
* first started, it is generally better to start them all at once
* in a single MPI_COMM_WORLD.
*/
fprintf(stderr, "will spawn %d slaves\n", universe_size-1);
MPI_Comm_spawn(worker_program, MPI_ARGV_NULL, universe_size-1,
MPI_INFO_NULL, 0, MPI_COMM_SELF, &everyone,
MPI_ERRCODES_IGNORE);
/*
* Parallel code here. The communicator "everyone" can be used
* to communicate with the spawned processes, which have ranks 0,..
* MPI_UNIVERSE_SIZE-1 in the remote group of the intercommunicator
* "everyone".
*/
MPI_Finalize();
return 0;
}
===================================================================
#include <stdio.h>
#include <stdlib.h>
#include "mpi.h"
#define error(s) { fprintf(stderr, "FATAL: " s "\n"); exit(1); }
int main(int argc, char *argv[])
{
int size;
MPI_Comm parent;
MPI_Init(&argc, &argv);
MPI_Comm_get_parent(&parent);
if (parent == MPI_COMM_NULL) error("No parent!");
MPI_Comm_remote_size(parent, &size);
if (size != 1) error("Something's wrong with the parent");
/*
* Parallel code here.
* The manager is represented as the process with rank 0 in (the remote
* group of) MPI_COMM_PARENT. If the workers need to communicate among
* themselves, they can use MPI_COMM_WORLD.
*/
MPI_Finalize();
return 0;
}
|