LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: qcqc_at_[hidden]
Date: 2005-07-17 07:24:37


Greetings all.
A program i am creating is the first ever use of MPI. It is supposed to be a small optimization program. A genetic algorithm running modyfing data that is input to Network Simulator 2 (ns2). Total population is always the same but each process gets a part of it depending on the number of machines (its ran on a cluster with debian - nfs loading of system). Later bash scripts modify the data from ns accordingly and put it back into the program.
Anyway i have questions concerning data sending and receiving.

This code runs in all instances:

//<C++ CODE>

    cout<<"\nGATHER "<<mynum;
    MPI_Barrier(MPI_COMM_WORLD);
    
    MPI_Gather( punkty , pop, MPI_FLOAT, pktall, pop, MPI_FLOAT, 0, MPI_COMM_WORLD);
   
    MPI_Gather( delay , pop, tablicafloat, delayall, pop, tablicafloat, 0, MPI_COMM_WORLD);
    
    MPI_Gather( size , pop, tablicalong, sizeall, pop, tablicalong, 0, MPI_COMM_WORLD);
    
    cout<<"\n END OF GATHER";

    //... a part of code done by only processor with mynum=0
    
   
    cout<<"\n I am processor nr : "<<mynum;
    cout<<"\nSCATTER";
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Scatter( delayall , pop, tablicafloat, delay, pop, tablicafloat, 0, MPI_COMM_WORLD);
    MPI_Barrier(MPI_COMM_WORLD);
    MPI_Scatter( sizeall , pop, tablicalong, size, pop, tablicalong, 0, MPI_COMM_WORLD);
    
//</C++ CODE>

output generated when 4 processors run it:

//NORMAL OUTPUT

GATHER : 1
GATHER : 2
END OF GATHER : 1
I am processor nr : 1
END OF GATHER : 2
I am procesorem nr : 2
GATHER : 3
END OF GATHER : 3
I am procesorem nr : 3
SCATTERMPI_Recv: process in local group is dead (rank 1, MPI_COMM_WORLD)
SCATTERMPI_Recv: process in local group is dead (rank 2, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - MPI_Barrier()
Rank (1, MPI_COMM_WORLD): - main()
Rank (2, MPI_COMM_WORLD): Call stack within LAM:
Rank (2, MPI_COMM_WORLD): - MPI_Recv()
Rank (2, MPI_COMM_WORLD): - MPI_Barrier()
Rank (2, MPI_COMM_WORLD): - main()
SCATTERMPI_Recv: process in local group is dead (rank 3, MPI_COMM_WORLD)
Rank (3, MPI_COMM_WORLD): Call stack within LAM:
Rank (3, MPI_COMM_WORLD): - MPI_Recv()
Rank (3, MPI_COMM_WORLD): - MPI_Barrier()
Rank (3, MPI_COMM_WORLD): - main()
-----------------------------------------------------------------------------
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 29261 failed on node n1 (192.168.0.3) due to signal 11.
-----------------------------------------------------------------------------
GATHER : 0
END OF GATHER : 0

//NORMAL OUTPUT

Been playing around with this for 2 days and i have no idea why this is happening.

The program works on a single processor (including the gathering and scattering.. but that is obvious since the process sends and recieves to/from itself...). What i must add is that ive got rid of MPI_Finalize.. it causes the program to crash on the single processor. Was trying to find the cause but cannot. Made sure i clean up all the allocated memory. And i am. Had no other ideas for a cause of the crash. So maybe the lack of MPI_Finalize is the cause of the MPI_Barrier not working.. not sure if you use the system semafors or whatever.. MPI_Barrier returns MPI_Success upon use.

Please help. Even a hint where to look would be gold worth.

Regards Krzysztof Korzunowicz

PS. The output and code can be slightly different because i translated both to english for the typical reader's sake.

PS2 definitions of tablicafloat and tablicalong:

    MPI_Datatype tablicafloat, tablicalong;
    MPI_Type_contiguous(2, MPI_FLOAT, &tablicafloat);
    MPI_Type_contiguous(2, MPI_LONG, &tablicalong);
    
    MPI_Type_commit(&tablicafloat);
    MPI_Type_commit(&tablicalong);