LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres \(jsquyres\) (jsquyres_at_[hidden])
Date: 2006-05-12 07:28:56


Greetings Tom.

Your tests are correct; this is an oddity in LAM. MPI_COMM_SELF
actually requires the lam_basic coll module; the shmem and smp modules
are not capable of being used on COMM_SELF.

So you really need to:

    mpirun -ssi coll lam_basic,shmem ...
and mpirun -ssi coll lam_basic,smp ...

This is not quite desirable because you want to test just the shmem or
smp modules, of course, but LAM's coll selection is unfortunately not
that fine-grained. However, the shmem and smp modules, when available,
should always be selected over lam_basic. You can verify this by
setting the coll_verbose SSI parameter high enough to see the selection
process:

    mpirun -ssi coll_verbose level:1000 ...

It is also quite possible/likely that MPICH's collective algorithms will
perform better than LAM's. LAM has the nice magpie stuff for SMP and
has some decent shared memory collectives, but nothing is
extraordinarily well tuned (lam_basic is standard linear/log stuff that
has been around for years that, while functional and correct, is nothing
special in terms of performance). Much more effort went into Open MPI
1.1.x's collective algorithms.

> -----Original Message-----
> From: lam-bounces_at_[hidden]
> [mailto:lam-bounces_at_[hidden]] On Behalf Of Tom Crockett
> Sent: Thursday, May 11, 2006 3:40 PM
> To: lam_at_[hidden]
> Subject: LAM: Trouble Specifying SSI Collectives
>
> Hi,
>
> We have an application that is exhibiting very poor performance in a
> section of code dominated by calls to various MPI collectives. The
> performance we're seeing with LAM 7.1.2 is much worse than
> with MPICH 1.2.6.
>
> In an attempt to figure out where the trouble is, I decided
> to compare
> the performance of LAM's different collective modules by explicitly
> specifying "-ssi coll xxx" on my mpirun command. This works fine for
> lam_basic, but with both smp and shmem, I get complaints from LAM:
>
> "No SSI coll modules said that they were available to run.
> This should
> not happen."
>
> The smp test case is running with eight processes spread across four
> dual-processor nodes (two per node); the shmem test uses four
> processes
> on a single quad-processor node.
>
> I double-checked the log files from my LAM build, and the smp
> and shmem
> modules both configured and compiled cleanly. lamtests-7.1.2 runs
> successfully as well.
>
> I suspect there's something simple I've overlooked, and I'm hoping
> someone on the list can enlighten me. Here's the mpirun command I use
> with "smp":
>
> /usr/local/v9a/generic/lam-7.1.2/bin/mpirun -ssi boot rsh
> -ssi rpi usysv
> -ssi coll smp -ssi coll_base_associative 1 -ssi ssi_verbose stdout
> -nsigs -pty -w -wd ~tom/tests/oceanM -sa -v -nger
> /tmp/pbslam.app_schema.28722
>
> "shmem" is identical except I substitute "shmem" for "smp".
>
> Here are the options we used to configure LAM:
>
> ./configure \
> --prefix=/usr/local/v9a/generic/lam-7.1.2 \
> --with-boot=rsh \
> --with-rpi=usysv \
> --with-rsh=/bin/rsh \
> --with-rpi-gm=/usr/local/gm \
> --with-rpi-gm-lib=/usr/local/gm/lib/sparcv9 \
> --with-fd-size=4096
>
> We're running this under Solaris. LAM is built using Sun's Studio 11
> compiler suite. We see the same problem under Solaris 9 on UltraSPARC
> and Solaris 10 on x86/x64 (AMD64). Off-node communication in
> both cases
> is TCP/IP over Gigabit or Fast Ethernet with usysv.
>
> Any thoughts on what's going wrong and/or how to fix it would
> be greatly
> appreciated.
>
> -Tom
>
> --
> Tom Crockett
>
> College of William and Mary email: tom_at_[hidden]
> Computational Science Cluster phone: (757) 221-2762
> Savage House fax: (757) 221-2023
> P.O. Box 8795
> Williamsburg, VA 23187-8795
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>