Maybe the title doesn't explain well what the real problem is.
Actually, I do never have too many processes running at the same time, but I do create many empty processes, one by one, each of them do absolutely nothing and finish right away so the next one is created. (My example is in the end of the message, it's just a few lines)
The problem is, my program always crashes in the 1000th child or so. I would expect that, if they were 1000 processes running at the same time, but its not the case. In the real code I am using the number of concurrent processes never gets higher than 4, and in the 1000th spawn I always get this:
MPI_Comm_spawn: error spawning process: Invalid argument (rank 0, MPI_COMM_SELF)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Comm_spawn()
Rank (0, MPI_COMM_WORLD): - main()
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 12602 failed on node n0 (200.20.15.194) with exit status 22.
-----------------------------------------------------------------------------
Is there any restrictions that I'm not aware of (like, the number of times a process can call Mpi_Comm_spawn)? I also wanted to be sure this doesn't happen only with me.
Below is the code I used to get the error, you can even make it so the next child is created only when the current finishes executing, but the error still comes.
#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
int main( int argc, char** argv)
{
int p, my_rank;
MPI_Comm parentcomm, intercomm;
MPI_Status status;
MPI_Init( &argc, &argv );
MPI_Comm_get_parent( &parentcomm );
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
int i, message;
if (parentcomm == MPI_COMM_NULL)
{
for (i=0; i<2000; i++){
MPI_Comm_spawn( "spawn_test", MPI_ARGV_NULL, p, MPI_INFO_NULL, 0, MPI_COMM_SELF, &intercomm, MPI_ERRCODES_IGNORE);
printf("Child number: %d\n", i);
//Uncomment the two next commented lines to restrict the number of concurrent child processes to one
// MPI_Recv(&message, 1, MPI_INT, 0, 0, intercomm, &status);
}
}
// MPI_Send(&message, 1, MPI_INT, 0, 0, parentcomm);
MPI_Finalize();
return 0;
}
|