Greetings all.
A program i am creating is the first ever use of MPI. It is supposed to be a small optimization program. A genetic algorithm running modyfing data that is input to Network Simulator 2 (ns2). Total population is always the same but each process gets a part of it depending on the number of machines (its ran on a cluster with debian - nfs loading of system). Later bash scripts modify the data from ns accordingly and put it back into the program.
Anyway i have questions concerning data sending and receiving.
This code runs in all instances:
//<C++ CODE>
cout<<"\nGATHER "<<mynum;
MPI_Barrier(MPI_COMM_WORLD);
MPI_Gather( punkty , pop, MPI_FLOAT, pktall, pop, MPI_FLOAT, 0, MPI_COMM_WORLD);
MPI_Gather( delay , pop, tablicafloat, delayall, pop, tablicafloat, 0, MPI_COMM_WORLD);
MPI_Gather( size , pop, tablicalong, sizeall, pop, tablicalong, 0, MPI_COMM_WORLD);
cout<<"\n END OF GATHER";
//... a part of code done by only processor with mynum=0
cout<<"\n I am processor nr : "<<mynum;
cout<<"\nSCATTER";
MPI_Barrier(MPI_COMM_WORLD);
MPI_Scatter( delayall , pop, tablicafloat, delay, pop, tablicafloat, 0, MPI_COMM_WORLD);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Scatter( sizeall , pop, tablicalong, size, pop, tablicalong, 0, MPI_COMM_WORLD);
//</C++ CODE>
output generated when 4 processors run it:
//NORMAL OUTPUT
GATHER : 1
GATHER : 2
END OF GATHER : 1
I am processor nr : 1
END OF GATHER : 2
I am procesorem nr : 2
GATHER : 3
END OF GATHER : 3
I am procesorem nr : 3
SCATTERMPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
SCATTERMPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - MPI_Barrier()
Rank (1, MPI_COMM_WORLD): - main()
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD): - MPI_Recv()
Rank (2, MPI_COMM_WORLD): - MPI_Barrier()
Rank (2, MPI_COMM_WORLD): - main()
SCATTERMPI_Recv: process in local group is dead (rank 3, MPI_COMM_WORLD)
Rank (3, MPI_COMM_WORLD): Call stack within LAM:
Rank (3, MPI_COMM_WORLD): - MPI_Recv()
Rank (3, MPI_COMM_WORLD): - MPI_Barrier()
Rank (3, MPI_COMM_WORLD): - main()
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 29261 failed on node n1 (192.168.0.3) due to signal 11.
-----------------------------------------------------------------------------
GATHER : 0
END OF GATHER : 0
//NORMAL OUTPUT
Been playing around with this for 2 days and i have no idea why this is happening.
The program works on a single processor (including the gathering and scattering.. but that is obvious since the process sends and recieves to/from itself...). What i must add is that ive got rid of MPI_Finalize.. it causes the program to crash on the single processor. Was trying to find the cause but cannot. Made sure i clean up all the allocated memory. And i am. Had no other ideas for a cause of the crash. So maybe the lack of MPI_Finalize is the cause of the MPI_Barrier not working.. not sure if you use the system semafors or whatever.. MPI_Barrier returns MPI_Success upon use.
Please help. Even a hint where to look would be gold worth.
Regards Krzysztof Korzunowicz
PS. The output and code can be slightly different because i translated both to english for the typical reader's sake.
PS2 definitions of tablicafloat and tablicalong:
MPI_Datatype tablicafloat, tablicalong;
MPI_Type_contiguous(2, MPI_FLOAT, &tablicafloat);
MPI_Type_contiguous(2, MPI_LONG, &tablicalong);
MPI_Type_commit(&tablicafloat);
MPI_Type_commit(&tablicalong);
|