Hi Everyone!
I'm solving a large CFD problem on different numbers of processors. The
CFD problem is almost perfectly load-balanced with 84 blocks of equal
size (and workload). Timings for CPU numbers up to 42 are reasonable,
although one begins to see the effect of communication time.
However, when I increase # CPUs from 42 to 84, the job stalls
completely. Time increases from around 60 seconds to almost 13.000
seconds (!). A closer look at the times for individual parts of the job
reveals, that a limited number of calls (approximately 120 calls) to
MPI_ALLGATHERV is responsible for the entire growth of time consumption.
I double-checked this conclusion by leaving out these calls (this
changes the computed results slightly), and the time was again around
the 60 seconds.
Each call to MPI_ALLGATHERV gathers about 43 Kb of double-precision
data, equal amount from each processor.
Since the gather operation is obviously needed here, could anything be
done to alleviate this situation - has anybody witnessed anything the
like?
Best regards, Jess Michelsen
|