Hi everyone!
Although my rough estimates of the network traffic using the allgather
function does not even nearly add up to the huge increase in time, which
I measured, I have now succesfully replaced the piece of code in
question by one using a gather-scatter scheme. Hence, the computations
in between are performed on one node instead of every node. Total time
now dropped to 42.6 seconds, which is even below the best estimate.
Best regards, Jess Michelsen
On Tue, 2004-01-06 at 18:31, Brian Barrett wrote:
> On Jan 6, 2004, at 9:03 AM, jess michelsen wrote:
>
> > However, when I increase # CPUs from 42 to 84, the job stalls
> > completely. Time increases from around 60 seconds to almost 13.000
> > seconds (!). A closer look at the times for individual parts of the job
> > reveals, that a limited number of calls (approximately 120 calls) to
> > MPI_ALLGATHERV is responsible for the entire growth of time
> > consumption.
> > I double-checked this conclusion by leaving out these calls (this
> > changes the computed results slightly), and the time was again around
> > the 60 seconds.
>
> Unfortunately, MPI_ALLGATHERV is a rather expensive operation - Each
> processor is sending every other processor those 43KB of data. So
> while you only doubling the number of nodes, you drastically increased
> the amount of data going out on the network. The MPI_ALL* functions
> are always going to be expensive, so you may want to see if there is a
> way to remove those functions from your programs inner loops. If you
> can factor your application so that data is only sent to nearest
> neighbors or something like that, you will find your application scales
> much better - global operations just don't scale :(.
>
> We are working on the LAM collective operations to improve performance,
> especially on large numbers of nodes and on SMP machines. LAM 7.0
> provided better performance for many of the basic collectives. The
> more complex operations are on their way :).
>
>
> Hope this helps,
>
> Brian
>
> --
> Brian Barrett
> LAM/MPI developer and all around nice guy
> Have a LAM/MPI day: http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
|