LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Kumar, Ravi Ranjan (rrkuma0_at_[hidden])
Date: 2005-03-30 12:48:43


Hello,

I wrote a code in C++ using MPI. I divided a bigger block into smaller
blocks and assigned each block to a different node/process. Below is the pseudo
code:

for(time=1;time<=Nt;time++)

{

do {

    //some data exchange between neigbouring (blocks) nodes/processes

    //some computation in each block (node/process)

    MPI_Allreduce(...to find convergence condition...);

   } while(convergence reached)

}

Results from parallel code agree quite well with the results from serial code.
However, the scalability is poor. When I increase number of processes or nodes,
there is not much improvement in the turnaround time. Serial code takes 225
seconds
whereas for the same conditions parallel code with:

3 processes on a single node take 167 seconds,
3 processes on 3 nodes take 115 seconds,
4 processes on 4 nodes take 111 seconds
10 processes on 5 nodes take 88 seconds,

Is this much scalability acceptable? I was expecting better performance, at
leat 10 times. Is there anyway I can improve the speedup? I am using LAM-MPI on
linux (Linux k00 2.4.17 ) cluster conneted via Ethernet (TCP/IP - i do not know
much about networking). Below is the cluster details:

20 + 2 spare nodes
44 1.4GHz Athlon processors
512 MB RAM per processor
Channel-Bonded Network
           four 24-way switches
           four NICs per node
           3-way+1 NFS or 4-way

 40 GB hard drive per node
 Theoretical peak performance: 224 GFLOPS

Is performance related to blocking/non-blocking send-recv too? By the way, I am
using blocking MPI_Send/Recv. Or it is related to the communication overhead?
How this communication overhead can be reduced? Pls give some idea.

Thanking in advance,
Ravi R. Kumar