This doesn't sound right -- from your description, it sounds like you
might have a communication bound application. Is your app sending a
lot of messages, possibly in a blocking pattern? I.e., do you have
processes waiting for data from other processes, such that the majority
of your run time is spend sending data or waiting for data, and little
time is spent actually computing what you're supposed to be computing.
If you're communicating over TCP, for example, you want to try to
overlap communication and computation as much as possible (because TCP
is traditionally a high-latency environment) so that you can start
sending / receiving, doing computation that is unrelated to the data
that you're sending / receiving, etc. Once the data has finally
finished sending / receiving, then do the computation required on that.
Optimizations like this can help performance a lot, but are typically
very application-specific.
These are the types of performance bottlenecks that you want to look
for. Hope this helps; good luck.
On Nov 5, 2004, at 3:50 PM, Brian A Powell wrote:
> Hello all,
>
> I am attempting to run a parallel program on a 5 machine cluster and am
> experiencing incredibly slow execution (roughly 10,000 times slower
> than
> when done on a single machine). The basic program structure consists
> of 4
> slaves performing a computation and sending their results to a master
> which then does some analysis and writes these results to file. The
> results are passed as a struct which can accomodate 168kB but is
> usually
> filled with 55kB. Is this too large?? As I add machines to the
> cluster,
> the execution time increases dramatically, and CPU loads avgs drop to
> virtually nothing.
>
> Thanks,
>
> Brian Powell
>
>
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|