LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Kumar, Ravi Ranjan (rrkuma0_at_[hidden])
Date: 2005-03-30 11:33:11


Hello Jeff,

Thanks a lot for clarifying doubts. I fixed all the errors. My code is running
smooth. Still there is performance related issue. I will send my queries in
another thread.

Thanks,
Ravi R. Kumar

Quoting Jeff Squyres <jsquyres_at_[hidden]>:

> On Mar 29, 2005, at 5:57 PM, Kumar, Ravi Ranjan wrote:
>
> > However, I have another doubt. I run my code using 2 nodes (K00 & K02)
> > in LAM. I used 10 processes to run my code hence 5 processes per node:
> >
> > [rrkuma0_at_k00 SOR]$ time mpirun -v -np 10 10Blocking_Dynamic_SOR_MPI
> > 5491 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5513 10Blocking_Dynamic_SOR_MPI running on n1
> > 5492 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5514 10Blocking_Dynamic_SOR_MPI running on n1
> > 5493 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5515 10Blocking_Dynamic_SOR_MPI running on n1
> > 5494 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5516 10Blocking_Dynamic_SOR_MPI running on n1
> > 5495 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5517 10Blocking_Dynamic_SOR_MPI running on n1
> >
> > I was watching the output on the terminal. For first few time steps,
> > all the ranks were priniting together. Later on I found that all the
> > even numbered processes were printed out together and odd numbered
> > processes printed out their outputs together. Even numbered processes
> > finished their job quite earlier compared to odd numbered processes.
> > Time taken by even numbered proceses were very less compared to that
> > of odd numbered. What can be the reason for this differnce? Why odd
> > numbered processes take too long? I dont think I put any workload
> > differnce between even and odd numbered processes in my code. I just
> > followed one thing in data_exchange_subroutine that even numbered
> > rank first sends data then receive whereas odd numbered process first
> > receives data then sends. that's all.
>
> There's 2 issues here:
>
> 1. You really can't rely on the ordering of output. So regardless of
> what it *looks* like, you really can't specifically say -- via
> printf-style output -- what order things finished in. Getting the
> *real* order is actually pretty hard; you have to account for the time
> differences between cluster nodes, etc. If all your nodes ntp sync to
> a common server (for example), it's a lot easier, but there still may
> be small differences.
>
> 2. When you oversubscribe a node, you really can't compare performance
> at all. There's too many factors involved when you start thrashing the
> CPU and memory subsystems, etc.
>
> > Again, I run my code using 10 processes on a single node. All the
> > processes
> > ended up simultaneously. See below:
> >
> > [rrkuma0_at_k00 SOR]$ time mpirun -v -np 10 10Blocking_Dynamic_SOR_MPI
> > 5575 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5576 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5577 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5578 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5579 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5580 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5581 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5582 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5583 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > 5584 10Blocking_Dynamic_SOR_MPI running on n0 (o)
> > Tue Mar 29 17:45:10 2005
> >
> > Time taken by 10 processes on a single node is 3 mins whereas time
> > taken by 10
> > processes distributed on 2 nodes is 5 mins. Why this is happening?
> > Kindly
> > clarify. Thanks a lot!
>
> Keep in mind that MPI communication takes time. When it's all one one
> node, it's done via shared memory and is very fast. When it's done
> across multiple nodes, I suspect you're using a TCP network, and that's
> orders of magnitude slower. So every Allreduce, Barrier, Send, Recv,
> etc., takes time. Parallel codes are typically designed quite
> carefully to minimize communication whenever possible, and/or overlap
> communication and computation.
>
> Here's a page that I wrote a long, long time ago that explains some of
> this kind of stuff: http://www.osl.iu.edu/~jsquyres/bladeenc/ See the
> Technical Details page.
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>