LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Timothy G Thompson (Timothy.G.Thompson_at_[hidden])
Date: 2006-04-17 20:18:02


Brian,

Thanks for your reply...the problem is fixed (as you suggested) by adding
an argument to the mpirun:
    mpirun -ssi rpi tcb -v pathSchema.

Interestingly (I suppose), when I tried: mpirun -ssi rpi usysv -v
pathSchema (and ...sysv) things
failed in MPI_Init.

For one particular benchmark, I got following timings problems were
resolved:
18 cpus, all running about the same rate, looking at 1 rank's timings:

  
thread--------------------------------------------------------------------------------------------------------------------------------
             main--------------- rusage----------------------
total------------ post------------------ wait-----------------------
start--------------------- complete-----------
             wall virtual user+sys user sys wall
  virtual wall virt sys wall virtual sys wall
virtual sys wall virt sys
prior: 455.21 338.13 437.22 379.76 57.45 167.61 41.62
1.39 0.008 0.007 79.31 20.13 25.08 84.86 21.01 31.63
 1.92 0.43 0.65
w/tcp: 333.28 332.00 332.29 332.11 0.17 21.78 0.10
0.01 0.008 0.0008 8.68 0.009 0.015 13.00 0.033 0.07
0.033 0.011 0.02

Noting in the 'prior' run, the difference between main's wall and virtual
is 117.08 seconds.
And summing the post/wait/start/complete's virtual and system time: 98.94
 cpu time comsumed !!

laminfo shows (for both the solaris sparc and solaris x86 builds):
...Thread support: yes
...RMOIO support: yes
...IMPI support: no
...SSI rpi: crtcp (API v1.1 module v1.1)
...SSI rpi: lamd (API v1.0 module v7.1)
...SSI rpi: sysv (...)
...SSI rpi: tcb (...)
...SSI rpi usysv (...)

huge improvement, THANKS !!
Tim Thompson

Brian Barrett <brbarret_at_[hidden]>
Sent by: lam-bounces_at_[hidden]
04/14/2006 12:18 PM
Please respond to
General LAM/MPI mailing list <lam_at_[hidden]>

To
General LAM/MPI mailing list <lam_at_[hidden]>
cc
Ronald S Clifton <Ronald.S.Clifton_at_[hidden]>, Matthew P Ferringer
<Matthew.P.Ferringer_at_[hidden]>
Subject
Re: LAM: post-start-complete-wait performance issue

On Apr 13, 2006, at 9:40 AM, Timothy G Thompson wrote:

> I'm a developer working on a research effort using LAM/MPI with
> multi-objective genetic algorithms. I've developed an ?asynchronous
> island? parallelization model using one-sided communication with
> the post-start-complete-wait as the synchronization mechanism.
> We?re running on a heterogeneous environment with linux boxes
> (Redhat 9), Sun sparc boxes (Solaris 2.9) and Sun x86 boxes
> (Solaris 2.10). We?re using LAM 7.1.1, with the downloaded linux
> RGP, and the Sun executables being built from the source.
>
> I?ve got detailed timers in place (wall, virtual: Sun only (thread
> local), and rusage (process virtual and system)). These timers show
> that the post-start-complete-wait calls on the Sun boxes have very
> poor performance (large amount of virtual AND system time being
> consumed during these four LAM calls). Whereas the linux boxes
> show very low (good) performance overhead.
>
> I?d look forward to any insight you can provide that might explain
> these differing results. Are the differences in the OS, my LAM
> build, and/or the linux RGP install causing this. Is the LAM code
> somehow blocking on linux and polling on Solaris?

That's an unusual finding. We don't do anything differently for the
post/wait/start/complete synchronization on Solaris and Linux. Both
are implemented over our point-to-point communication routines. Is
LAM/MPI built the same on both machines (you might want to look at
the output of laminfo to see if the same set of components are
available). You might want to try using tcp instead of usysv or sysv
for the transport engine - it might help calm the Solaris boxes down
(but that's just a guess). Can you use the profiling tools available
with Solaris to see where LAM is spending all it's time when it is
behaving badly?

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have a LAM/MPI day: http://www.lam-mpi.org/
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/