On Fri, 26 Sep 2003, Wa-Kun Lam wrote:
> >>FFTs are communication dominated. It is not at all surprising that
> >>a small FFT runs faster on one node than distributed. What is your
> >>network?
> >
> >I am running Redhat 7.2. All nodes are connected by a HUB. Every machine
> >equipped with 1.4G cpu, 256 RAM.
Are you sure it is a hub, rather than a switch? A switch would be much
better for FFTW's communication patterns (All-to-all).
And what speed network technology is used? 100 Mb/s (Fast Ethernet),
1 Gb/s (Gigabit Ethernet), etc.?
Unfortunately, parallel machines (esp. beowulf clusters) are not a
guarantee of speedup for all algorithms, even ones that are tuned
for high latency communication links. If the problem is too small,
it just won't benefit from large grain parallelism. A discussion
of Ahmdahl's Law is outside the scope of the LAM mailing list, but
the short short version is that to get useful speedups you have to
balance the overhead of communications with similar amounts of
independent computations.
For the specific case of FFTW, a simple google search for
"fftw MPI benchmark" yielded this brief but helpful page:
http://olympus.het.brown.edu/cgi-bin/info2www?(fftw)MPI+Tips
Good luck.
--
Tim Mattox - tmattox_at_[hidden] - http://homepage.mac.com/tmattox/
http://aggregate.org/KAOS/ - http://advogato.org/person/tmattox/
|