Hello,
I'm a developer working on a research effort using LAM/MPI with
multi-objective genetic algorithms. I've developed an ?asynchronous
island? parallelization model using one-sided communication with the
post-start-complete-wait as the synchronization mechanism. We?re running
on a heterogeneous environment with linux boxes (Redhat 9), Sun sparc
boxes (Solaris 2.9) and Sun x86 boxes (Solaris 2.10). We?re using LAM
7.1.1, with the downloaded linux RGP, and the Sun executables being built
from the source.
I?ve got detailed timers in place (wall, virtual: Sun only (thread local),
and rusage (process virtual and system)). These timers show that the
post-start-complete-wait calls on the Sun boxes have very poor performance
(large amount of virtual AND system time being consumed during these four
LAM calls). Whereas the linux boxes show very low (good) performance
overhead.
I?d look forward to any insight you can provide that might explain these
differing results. Are the differences in the OS, my LAM build, and/or
the linux RGP install causing this. Is the LAM code somehow blocking on
linux and polling on Solaris?
Thanks, Tim Thompson
|