Excellent explanation.
Also don't forget caching and virtual memory effects -- if you have
multiple processes sharing the same processor, they can potentially thrash
each other's caches which can be disasterous to performance. This is
obviously highly dependant upon the code that you're running, but codes
that are especially sensitive to cache sizes (typical number-chunking
codes are, for example) may perform horribly when forced to share a
processor (and therefore share at least some form of a cache).
Also, if your processes take up more memory than physical RAM is
available, you're going to be [potentially] thrashing the virtual memory
system. This makes performance go from bad to worse. This can obviously
happen regardless of how many processors you have (on a single machine),
but it's a related common mistake and is worth mentioning here.
The short version is: for HPC kinds of applications, 4 processes running
on a single processor is usually nowhere near the same thing as 4
processes running on 4 processors.
On Tue, 30 Mar 2004 dburbano_at_[hidden] wrote:
> When you run multiple processes in a processor and they want to
> communicate between them, the only way or the only form that they can
> communicate or do some computation is with the same processor, for that
> reason take more time than in many processors.
>
> for example, I have 4 processes and only one processor, they have to do
> some computation and some communication (reduce the information to process
> 0). The processes can not communicate betwem them after do their
> computation; so, each process (p0, p1, p2, p3) is executed in the
> processor. The processor can not execute 2 processes at same time, it can
> do by time slice. For that reason, the last processes that are in the
> queue have to wait until the first processes free the processor.
>
> This happens in the communications, the processes need the processor to
> send or receive information, if you are going to use MPI_Reduce, and the
> process p0 is the last to use the processor, the first processes should
> send the information to the process p0 but the process 0 should not ready.
>
> now, what happen if you have four processors and four processes? There is
> one processor for each process; then, the computation is executed at same
> time, they don't share their processors, and they don't wait to use a
> processor. When they finish their computation, they start to send the
> information to other. Using MPI_Reduce, they start to send the
> information to one process (for example p0) that belong to a processor.
> In this case the the process that is receiving the information is ready to
> receive the data from the others (some times is not ready but depend on
> many characteristics);and the others are sending the information at same
> time (depend on hardware configuration).
>
> This one reason from many that the communication and computation time in
> an environment of many processes and processors is less than many
> processes in one processor.
>
>
> This is good link:
>
> http://www.cs.rit.edu/~ncs/parallel.html#books
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|