Hi,
I wrote a mpi program for solving linear equations and it does give the correct
result, but when I run the program repeatly, it runs slower and slower. For one
case of two processors, the time for one step varied from 219 seconds for the
first step to 554 seconds for the 10th step, but the calculations of each step are
exactly the same. Seems that the scheduling among the tasks is unbalanced.
Anybody also had this problem before? Please help me.
The output of my program is like:
SHELL% mpirun -ssi rpi sysv myapp
STEP 1:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 20.886864
Timer 1 for gather: count = 1468, time = 173.243905
Timer 2 for reduce: count = 6036, time = 5.449356
Timer 3 for bcast: count = 0, time = 0.941890
Process <1> timers:
Timer 0 for sub: count = 2937, time = 21.310844
Timer 1 for gather: count = 1468, time = 170.313529
Timer 2 for reduce: count = 6036, time = 8.222676
Timer 3 for bcast: count = 0, time = 0.943633
Total omnodr time = 219.992384
STEP 2:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 20.848251
Timer 1 for gather: count = 1468, time = 175.721152
Timer 2 for reduce: count = 6036, time = 11.647156
Timer 3 for bcast: count = 0, time = 0.516676
Process <1> timers:
Timer 0 for sub: count = 2937, time = 20.769629
Timer 1 for gather: count = 1468, time = 176.512609
Timer 2 for reduce: count = 6036, time = 11.098584
Timer 3 for bcast: count = 0, time = 0.516803
Total omnodr time = 227.987318
STEP 3:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 21.012736
Timer 1 for gather: count = 1468, time = 210.400191
Timer 2 for reduce: count = 6036, time = 17.359436
Timer 3 for bcast: count = 0, time = 0.490925
Process <1> timers:
Timer 0 for sub: count = 2937, time = 20.384681
Timer 1 for gather: count = 1468, time = 182.488757
Timer 2 for reduce: count = 6036, time = 46.003675
Timer 3 for bcast: count = 0, time = 0.493234
Total omnodr time = 268.607717
STEP 4:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 22.160519
Timer 1 for gather: count = 1468, time = 226.491551
Timer 2 for reduce: count = 6036, time = 22.529618
Timer 3 for bcast: count = 0, time = 0.544272
Process <1> timers:
Timer 0 for sub: count = 2937, time = 20.554751
Timer 1 for gather: count = 1468, time = 190.271343
Timer 2 for reduce: count = 6036, time = 60.288964
Timer 3 for bcast: count = 0, time = 0.546628
Total omnodr time = 291.045351
STEP 5:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 21.615419
Timer 1 for gather: count = 1468, time = 232.338246
Timer 2 for reduce: count = 6036, time = 27.635866
Timer 3 for bcast: count = 0, time = 0.512605
Process <1> timers:
Timer 0 for sub: count = 2937, time = 20.785000
Timer 1 for gather: count = 1468, time = 198.468036
Timer 2 for reduce: count = 6036, time = 62.600546
Timer 3 for bcast: count = 0, time = 0.515364
Total omnodr time = 302.223854
STEP 6:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 22.103874
Timer 1 for gather: count = 1468, time = 242.393103
Timer 2 for reduce: count = 6036, time = 31.386167
Timer 3 for bcast: count = 0, time = 0.576921
Process <1> timers:
Timer 0 for sub: count = 2937, time = 20.571535
Timer 1 for gather: count = 1468, time = 205.591732
Timer 2 for reduce: count = 6036, time = 71.268750
Timer 3 for bcast: count = 0, time = 0.581129
Total omnodr time = 317.488496
STEP 7:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 22.947837
Timer 1 for gather: count = 1468, time = 250.703343
Timer 2 for reduce: count = 6036, time = 39.540515
Timer 3 for bcast: count = 0, time = 0.538599
Process <1> timers:
Timer 0 for sub: count = 2937, time = 21.716292
Timer 1 for gather: count = 1468, time = 220.841175
Timer 2 for reduce: count = 6036, time = 72.211103
Timer 3 for bcast: count = 0, time = 0.542357
Total omnodr time = 335.016795
STEP 8:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 27.126867
Timer 1 for gather: count = 1468, time = 332.812285
Timer 2 for reduce: count = 6036, time = 104.659123
Timer 3 for bcast: count = 0, time = 0.734146
Process <1> timers:
Timer 0 for sub: count = 2937, time = 27.840139
Timer 1 for gather: count = 1468, time = 314.123519
Timer 2 for reduce: count = 6036, time = 122.871434
Timer 3 for bcast: count = 0, time = 0.738456
Total omnodr time = 490.470455
STEP 9:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 29.285080
Timer 1 for gather: count = 1468, time = 331.930461
Timer 2 for reduce: count = 6036, time = 147.903799
Timer 3 for bcast: count = 0, time = 0.920213
Process <1> timers:
Timer 0 for sub: count = 2937, time = 31.194333
Timer 1 for gather: count = 1468, time = 352.703100
Timer 2 for reduce: count = 6036, time = 124.809475
Timer 3 for bcast: count = 0, time = 0.925016
Total omnodr time = 536.432494
STEP 10:
Loading matrix from file 'matrix_l'...
Matrix file 'matrix_l' loaded.
Starting PCR: _g_size = 46280, _sp_size = 1094564, bandwidth = 23.650908
Parallel mode: MPI tasks = 2
Converged in 1468 iterations, e_residue = 4.4007069704E-08, omn count = 1467.
Process <0> timers:
Timer 0 for sub: count = 2937, time = 28.739998
Timer 1 for gather: count = 1468, time = 337.927366
Timer 2 for reduce: count = 6036, time = 160.852730
Timer 3 for bcast: count = 0, time = 0.729920
Process <1> timers:
Timer 0 for sub: count = 2937, time = 29.843883
Timer 1 for gather: count = 1468, time = 370.256357
Timer 2 for reduce: count = 6036, time = 125.977428
Timer 3 for bcast: count = 0, time = 0.735170
Total omnodr time = 554.968516
Master 0 safely stop.
Slave 1 safely stop.
|