thank you for your answer to my questions. I have already compared the
performance of serial
program to parallel one using the serial version of the code. However,
the serial code
is not optimized (so far) and for larger problems a have obtained the
superlinear speedup.
As I know the people compare the performance of different algorithms
even if the
superliner speedup is observed. I will try to improve my sequential
algorithm for that.
thanks for your help
Slawomir
Jeff Squyres wrote:
>This is not a good mechanism for measuring the difference. In this
>case, you're going to be running 4 processes on one node (2 CPUs, in
>your case). I'm guessing that all 4 processes will attempt to fully
>utilize their CPUs, leading to the OS having to schedule all 4 on 2
>resources -- leading to process swappage, and potentially unnecessary
>data movement between your processors (e.g., if process A moves between
>both CPUs). You also need to consider how much memory each process
>takes -- will the sum of all 4 processes exceed the physical memory of
>your machine? If so, you'll also incur a lot of memory thrashing.
>This will potentially be a *lot* of overhead.
>
>The general rule of thumb is: running N processes on M processors
>(where N > M) simultaneously will always take more time than running M
>processes at a time (until you have run a total of N processes) because
>of process and/or memory thrashing.
>
>If you want to compare serial performance vs. parallel performance, you
>really need to have a serial version of your code -- one that can run
>in a single process (or, if you're comparing by node and not by CPU,
>one that can run in 2 processes since you have 2 CPUs in a node -- but
>be sure to take memory constraints into consideration!). Then compare
>the performance of that vs. your parallel runs.
>
>Hope this helps.
>
>
>On Jun 15, 2005, at 6:15 AM, Slawomir Kubacki wrote:
>
>
>
>>Dear Sir, Madam,
>>
>>I want to measure the efficiency of the parallel program using domain
>>decomposition aproach
>>The efficiency (speedup=t(1)/t(p)) can be measured as the ratio of
>>executing time
>>on one processor to executing time on p processors.
>>I order to measure the time t(1) it is necessary to run the parallel
>>program
>>(formally decomposed into p subdomains) on one processor.
>>For example the problem decomposed into four subdomains
>>I am running on one node n1:
>>
>>mpirun -c 4 n1 program_name
>>
>>As we have 2 processors on one node I am not certain that
>>running the parallel program on one node the one or two processors are
>>used in fact.
>>Please indicate how to force that the problem can be run on one
>>processor only?
>>
>>regards,
>>
>>Slawomir Kubacki
>><kubacki.vcf>_______________________________________________
>>This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>>
>
>
>
|