LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Josh Hursey (jjhursey_at_[hidden])
Date: 2005-01-20 09:00:42


The behavior of collective operations, which includes MPI_Bcast, in
LAM/MPI is determined at runtime by the type of environment and system
architecture that you are using. I am assuming that you are using
LAM/MPI 7.0 or better for this run. If not then this might change the
answer slightly.

There are three main types of modules that may be used when a process
runs, each module implements a different algorithm as determined by
your system. In increasing order of preference if none are specified
there are the following modules: 'shmem', 'smp', and 'lam_basic'. For
shared memory systems the 'shmem' module is use and all operations
occur in a shared segment of memory. In a cluster of SMP machines the
'smp' module is commonly used and it implements a MagPIe-like
algorithm. For all other types of systems the 'lam_basic' module is
used and with less than four processes this is a basic linear
algorithm. For a discussion on collective operations you might want to
look at section 9.4 of the LAM/MPI User's Guide.

Without knowing exactly what type of system you are running on I can
only speculate at the reason for the performance boost that you notice
after N iterations. If you are using one of the shared memory
algorithms then I would speculate that after the first 34 or so
iterations the operating system has stopped reclaiming the segment of
shared memory which is allocated by each MPI_Bcast, and the performance
increases since the OS doesn't create the shared segment any longer and
just notifies the processes that it is available. If you are not using
a shared memory module, then it may be possible that the switch is
assisting in the transfer my caching routes, thus decreasing the setup
time at the switch.

Josh

On Jan 19, 2005, at 2:26 PM, Siva Bandhamravuri wrote:

>
> Hi,
> I am working on a n-body simulation which uses a lot of MPI_Bcast.
>
> I am using -np 2 for the runs.
> When i timed the MPI_Bcast for every iteration, I have this scenario
> coming up.
>
>> From iterations 0 - 34, the time is 0.08 seconds for each MPI_Bcast
>> and
> somewhere in the range of 34 - 38, the time is reducing to 0.04
> seconds for 1
> iteration and then reducing to 0.001 for the rest of the iterations.
>
> Is this any optimization in LAM-MPI.
> Can anyone explain how MPI_Bcast is implemented in LAM-MPI.
>
> thanks
> Siva
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

----
Josh Hursey
jjhursey_at_[hidden]
http://www.lam-mpi.org/