Tuning LAM
If you don't want to read the rest of the instructions, the following
should do the trick for most situations:
% gunzip -c lam-6.5.9.tar.gz | tar xf -
% cd lam-6.5.9 % ./configure --prefix=/path/to/install/in
[...lots of output...]
% make
[...lots of output...]
% make install
[...lots of output...]
% make examples # This step is optional
[...lots of output...]
If you do not specify a prefix, LAM will first look for "lamclean"
in your path. If lamclean is found, it will use the parent of the
directory where lamclean is located as the prefix. Otherwise,
/usr/local is used (like most GNU software).
Now go read the RELEASE_NOTES file; it contains all the
information about the new features of this release of LAM/MPI.
Common causes of failure:
- No C++ compiler installed; use --without-mpi2cpp configure option
- C++ compiler does not support required C++ features; use
--without-mpi2cpp configure option
- No Fortran compiler installed; user --without-fc configure option
The LAM distribution is packaged as a compressed tape archive, lam-6.5.9.tar.Z, lam-6.5.9.tar.gz, or lam-6.5.9.tar.bz2. It is available from the main LAM web
site: http://www.lam-mpi.org.
Uncompress the archive and extract the sources.
% gunzip -c lam-6.5.9.tar.gz | tar xf -
or
% uncompress -c lam-6.5.9.tar.Z | tar xf -
or
% bunzip2 -c lam-6.5.9.tar.bz2 | tar xf -
LAM/MPI will build on just about any POSIX system. There are,
however, a few restrictions:
Microsoft Windows is not a POSIX platform. LAM/MPI currently will
not build in a Windows environment.
It appears that GNU libtool does not presently support building
shared libraries on AIX. This has been tested on AIX 4.3.3; it is not
known if GNU libtool builds shared libraries on other versions of AIX.
Additionally, in some cases, GNU libtool apparently does not
function completely properly when using the "xlc" compiler. Use "cc",
instead (they are both the same compiler anyway).
Finally, there have been repeatable problems with AIX's "make"
when building ROMIO. This does not appear to be ROMIO's fault -- it
appears to be a bug in AIX's "make". The LAM Team suggests that you
use GNU "make" (ftp://ftp.gnu.org/gnu/make/) when building on AIX
platforms to avoid these problems.
The version of "make" that is distributed on some BSD systems
(e.g., FreeBSD) requires the use of the "-i" parameter to some of
LAM's make targets. For example:
make -i clean
It appears that the default C++ compiler on HP-UX
(CC) is a pre-ANSI standard C++ compiler. As such, it
will not build the C++ bindings package. The C++ compiler
aCC should be used to build the C++ bindings package.
The C++ compiler can be specified by specifying
--with-cxx=aCC as an option to configure.
LAM uses a GNU configure script to perform site and architecture
specific configuration.
Change directory to the top level LAM directory (lam-6.5.9) and run the configure script.
% ./configure {options}
or
% sh ./configure {options}
By default the configure script sets the LAM install directory to
the parent of where lamclean is found (if it is in your path), or
/usr/local if lamclean is not in your path. This can be overridden
with the --prefix option (see below).
Note that the ROMIO package does not currently support many
GNU-like configure switches. In particular, attempting to use any of
the directory-specifying options (other than --prefix) will not work
as expected with ROMIO. ROMIO installs everything under
$(DESTDIR)$prefix. Hence, if you attempt to use switches such as
--libdir, --bindir, etc. to LAM's configure, all of the LAM (and the
C++ bindings) will install as expected, but ROMIO will still install
itself under $prefix.
The ROMIO authors have been notified of this issue.
The default is to build LAM/MPI with the C++ bindings, but without
C++ exception support.
Enabling C++ exceptions typically entails a slight degradation of
run-time performance because of extra bootstrapping required for every
function call (particularly with gcc/g++). As such, they are disabled
by default, and the MPI::ERRORS_THROW_EXCEPTIONS error handler will
only print out error messages. If full exception handling
capabilities are desired, LAM must be configured with the
"--with-exceptions" flag. It should be noted that some C++ (and C and
Fortran) compilers need additional command line flags to properly
enable exception handling.
For example, with gcc/g++ 2.95.2 and later, gcc, g77, and g++ all
require the command line flag "-fexceptions". gcc and gf77 require
"-fexceptions" so that they can pass C++ exceptions through C and
Fortran functions properly. As such, all of LAM/MPI must be
compiled with the appropriate compiler options, not just the C++
bindings. Using MPI::ERRORS_THROW_EXCEPTIONS without having compiled
LAM with proper exception support will cause undefined behavior (read:
core dumps and other Bad Things).
If building with IMPI or the C++ bindings, LAM's configure script
will automatically guess the necessary compiler exception support
command line flags for the gcc/g++ and KCC compilers. That is, if a
user selects to build the MPI 2 C++ bindings and/or the IMPI
extensions, and also selects to build exception support, and g++ or
KCC is selected as the C++ compiler, the appropriate exceptions flags
will automatically be used.
Users with other compilers that require command line flags for
exception support should use the "--with-exflags=FLAGS" command line
switch to configure.
Note that this also applies even if you do not build the C++
bindings -- if LAM is to call C++ functions that may throw exceptions
(e.g,. from an MPI error handler or other callback function), you need
to build LAM with the appropriate exceptions compiler flags.
A single vendor product line should be used to compile all of the
C, Fortran, and C++ code. That is, if gcc is used to compile LAM, g++
should be used to compile the C++ bindings, and gcc/g++/g77 should be
used to compile any user programs. Mixing multiple vendors' compilers
between different components of LAM/MPI and/or to compile user MPI
programs, particularly when using the C++ MPI bindings, is almost
guaranteed not to work.
C++ compilers are not link-compatible -- compiling the C++
bindings with one C++ compiler and compiling a user program that uses
the MPI C++ bindings will almost certainly produce linker errors.
Indeed, if exception support is enabled in the C++ bindings, it
will only work if the C and/or Fortran code knows how to pass C++
exceptions through their code. This will only happen properly when
the same compiler (or a single vendor's compiler product line, such as
gcc, g77, and g++) is used to compile all components --
LAM/MPI, the C++ bindings, and the user program. Using multiple
vendor compilers with C++ exceptions will almost certainly not work
(read: core dumps and other Bad Things).
The one possible exception to this rule (pardon the pun) is the
KCC compiler. Since KCC turns C++ code to C code and then gives it to
the back end "native" C compiler, KCC may work properly with the
native C and Fortran compilers.
Alternatively, LAM supports the "VPATH" building mechanism. If
LAM/MPI is to be installed in multiple environments that require
different options to configure, or require different compilers (such
as compiling for multiple architectures/operating systems), the
following form can be used to configure LAM:
% cd /some/temp/directory
% LAMTOP/configure {options}
where LAMTOP is the directory where the LAM/MPI distribution
tarball was expanded. This form will build the LAM executables and
libraries under /some/temp/directory and will not produce any files in
the LAMTOP tree. It allows multiple, concurrent builds of LAM/MPI
from the same source tree.
Note that you must have a VPATH-enabled "make" in order to use
this form. The GNU "make" (ftp://ftp.gnu.org/gnu/make/) supports
VPATH builds, for example, but the Solaris Workshop 5.0 "make" does
not. Parts of LAM/MPI may compile correctly in a VPATH build without
a VPATH-enabled compiler, but ROMIO will not.
The configure script will create several configuration files,
including share/include/lam_config.h. You may wish to inspect this
file for a sanity check, but ./configure usually guesses correctly.
There are many options available from the configure script. You
can use the command "./configure --help" to list them all. An
explanation of each follows (shown here in alphabetical order):
--disable-static
Do not build static libraries. This flag is only meaningful when
--enable-shared is specified; if this flag is specified without
--enable-shared, it is ignored, and static libraries are created.
--enable-echo
Will echo all of the commands that configure executes. This is
usually for debugging purposes only, and is not recommended for end
users.
--enable-shared
Build shared libraries. Note that this option is incompatible
with --with-romio (which is the default) and --with-mpi2cpp (which is
also the default) because (among other reasons) ROMIO expects to find
libmpi.a, not libmpi.so.
Also note that enabling building shared libraries does
not disable building the static libraries. Specifying
--enable-shared without --disable-static will result in a build taking
twice as long, and installing both the static and shared libraries.
Finally, note that neither ROMIO nor the MPI 2 C++ bindings do not
currently support shared libraries. They will always be built as
static libraries.
--prefix=PREFIX
Sets the installation location for the LAM binaries, libraries,
etc., to the directory PREFIX. PREFIX must be specified as an
absolute directory name.
--with-cc=CC
Use the C compiler CC. The C compiler can also be selected by
setting the "CC" environment variable before running configure. This
compiler will be used both to compile LAM, and as the default compiler
for the hcc(1) and mpicc(1) wrapper compilers.
--with-cflags=CFLAGS
Use the C compiler flags CFLAGS. The flags passed to the C
compiler can also be selected by setting the "CFLAGS" environment
variable before running configure. These flags are used to compile
LAM, ROMIO, and some example programs that come with LAM. If CFLAGS
are not specified, ./configure will pick optimization flags to use.
These flags are not used as default flags in any of the
wrapper compilers.
--with-cxx=CXX
Use the C++ compiler CXX. The C++ compiler can also be selected
by setting the "CXX" environment variable before running configure.
This compiler will be used to compile the MPI 2 C++ bindings, IMPI
support, and will be used as the default compiler for the hcp(1) and
mpiCC(1) wrapper compilers.
--with-cxxflags=CXXFLAGS
Use the C++ compiler flags CXXFLAGS. The flags passed to the C++
compiler can also be selected by setting the "CXXFLAGS" environment
variable before running configure. These flags will be used when
compiling the MPI 2 C++ bindings, IMPI support, as well as some
example programs that come with LAM. If CXXFLAGS are not specified,
./configure will pick optimization flags to use.
These flags are not used as default flags in any of the
wrapper compilers.
--with-cxxldflags=CXXLDFLAGS
Use the C++ linker flags CXXLDFLAGS. These flags will be used
when compiling the MPI 2 C++ bindings, IMPI support, as well as some
example programs that come with LAM. If CXXFLAGS are not
specified. ./configure will pick optimization flags to use.
These flags are not used as default flags in any of the
wrapper compilers.
--with-exceptions
Used to enable exception handling support in the C++ bindings for
MPI. Exception handling support (i.e., the
MPI::ERRORS_THROW_EXCEPTIONS error handler) is disabled by default.
See the section "MPI 2 C++ Issues", above.
--with-exflags=FLAGS
Used to specify any command line arguments that are necessary for
the C, C++, and Fortran compilers to enable C++ exception support.
This switch is ignored unless --with-exceptions is also specified.
This switch is unnecessary for gcc/g77/g++ version 2.95 and above
-- "-fexceptions" will automatically be used (when building
--with-exceptions). Additionally, this switch is unnecessary if the
KCC compiler is used -- "-x" is automatically used.
See the section entitled "MPI 2 C++ Issues", above.
--with-fc=FC
Use the Fortran compiler FC. Specify FC=no (or --without-fc) to
disable Fortran support if you do not have a Fortran compiler or do
not require such support. This compiler will be used both to compile
LAM, and as the default compiler for the hf77(1) and mpif77(1) wrapper
compilers.
--with-fflags=FFLAGS
Use the Fortran compiler flags FFLAGS when compiling LAM. The
flags passed to the Fortran compiler can also be selected by setting
the "FFLAGS" environment variable before running configure. These
flags will be used only when compiling some example programs that come
with LAM. If FFLAGS are not specified, ./configure will pick
optimization flags to use.
These flags are not used as default flags in any of the
wrapper compilers.
--with-impi
Use this switch to enable the IMPI extensions. The IMPI
extensions are still considered experimental, and are disabled by
default.
--with-lamd-ack=SEC
Number of seconds until an ACK is resent between LAM daemons. You
probably shouldn't need to change this; the default is one a second.
--with-lamd-hb=SEC
Number of seconds between heartbeat messages in the LAM daemon
(only applicable when running in fault tolerant mode). You probably
shouldn't need to change this; the default is 120 seconds.
--with-lamd-boot=SEC
Set the default number of seconds to wait before a process started
on a remote node is considered to have failed (e.g., during lamboot).
You probably shouldn't need to change this; the default is 60 seconds.
--with-ldflags=LDFLAGS
Use the LD linker flags LDFLAGS. If this flag is not set on the
./configure command line, the value for CFLAGS is used. These flags
are used to link LAM executables and all example programs that come
with LAM. If LDFLAGS (and CFLAGS) are not specified, ./configure will
pick optimization flags to use.
These flags are not used as default flags in any of the
wrapper compilers.
--without-mpi2cpp
Build LAM without the MPI-2 C++ bindings (see chapter 10 of the
MPI-2 standard); the default is to build them. The C++ bindings
require some advanced features of the C++ compiler. While most modern
C++ compilers now support all the required features, you may encounter
problems on some platforms. Consult the mpi2c++/README file for more
information.
--without-profiling
Build LAM/MPI without the MPI profiling layer. The default is to
build this layer, since ROMIO uses it. See the --without-romio option
for more details.
--with-pthread-lock
Use a process shared pthread mutex to lock access to the shared
memory pool rather than the default SYSV semaphore. This option is
only valid with the "usysv" RPI, and on systems which support process
shared pthread mutexes.
--with-purify
Causes LAM to zero out all data structures before using them.
This option is not necessary to make LAM function correctly (LAM
already zeros out relevant structure members when necessary), but it
is very helpful when running MPI programs through memory checking
debuggers, such as purify and the Solaris Workshop bcheck program.
See the "Zeroing out LAM buffers before use" section of the
RELEASE_NOTES file for more information. The default is to not enable
this option.
--without-romio
Build LAM without ROMIO support (ROMIO provides the MPI-2 I/O
support, see chapter 9 of the MPI-2 standard); the default is to build
with ROMIO support. ROMIO is known to only work on certain
systems. Consult the romio/README file for more information. Note
that this option is incompatible with --with-shared, because (among
other reasons) ROMIO expects to find libmpi.a, not libmpi.so.
Note also that building ROMIO implies building the profiling
layer. ROMIO makes extensive use of the MPI profiling layer; that is
you cannot select --without-profiling without also specifying
--without-romio.
--with-romio-flags=FLAGS
Pass FLAGS to ROMIO's configure script when it is invoked during
the build process. This switch is to effect specific behavior in
ROMIO, such as building for a non-default file system (e.g., PVNFS).
Note that LAM already sends the following switches to ROMIO's
configure script -- the --with-romio-flags switch should not be used
to override them:
- --prefix
- -mpi
- -mpiincdir
- -cc
- -fc
- -debug (if -g is specified in CFLAGS)
- -cflags
- -fflags
- -nof77 (if --without-fc is selected in LAM)
- -make
- -mpilib
--with-rpi=RPI
Build with request progression interface (RPI) transport layer RPI
[RPI=tcp]. RPI must one of: tcp, sysv, or usysv. If this option is
not specified, the RPI transport layer defaults to tcp. Please refer
to the RELEASE_NOTES file for descriptions of the RPI transport
layers.
--with-rsh=RSH
Use RSH as the remote shell command. For example if you want to
use the secure shell ssh then specify --with-rsh="ssh -x" (note that
the "-x" is necessary to prevent the ssh 1.x series of clients from
sending its standard banner information to standard error, which will
cause recon/lamboot/etc. to fail). This shell command will be used to
launch commands on remote nodes from binaries such as lamboot, wipe,
etc. The command can be one or more shell words, such as a command
and multiple command line switches.
This value can be overridden at recon/lamboot/etc. run time with
the LAMRSH environment variable. See the RELEASE_NOTES file for more
details.
--with-select-yield
Force the use of select() to yield the processor.
--with-shm-maxalloc=BYTES
Use BYTES as the size of the maximum allocation from the shared
memory pool. If no value is specified, configure will set the size
according to the value of shm-poolsize (below). See "Usysv and Sysv
transports", below.
--with-shm-poolsize=BYTES
Use BYTES as the size of the shared memory pool. If no size is
specified, configure will determine a suitably large size to use. See
"Ususv and Sysv transports", below.
--with-shm-short=BYTES
Use BYTES as the maximum size of a short message when
communicating via shared memory. Default is 8 KB.
--without-shortcircuit
Disable the send/receive short circuiting optimization. The short
circuit optimization has proven to be fairly stable, and this option
is not usually necessary. It remains for hysterical raisins.
--with-signal=SIGNAL
Use SIGNAL as the signal used internally by LAM. The default value
is "SIGUSR2". To set the signal to "SIGUSR1" for example, specify
--with-signal=SIGUSR1.
--with-tcp-short=BYTES
Use BYTES as the maximum size of a short message when
communicating over TCP. Default is 64 KB. This is relevant to all
RPIs, since the shared memory RPIs are multi-protocol -- they will use
TCP when communicating with MPI ranks that are not in the same node.
--with-thread
This option is not yet supported. Do not use it.
--with-trillium
Build and install the Trillium support executables, header files,
and man pages. These extra Trillium executables, header files, and
man pages are not necessary for normal MPI operation; they are
intended for Trillium developers and certain third party products that
interact with the lower layer of LAM/MPI. Building XMPI
(http://www.lam-mpi.org/software/xmpi/), for example, requires that
all the Trillium header files were previously installed. Hence, if
you intend to compile XMPI after installing LAM/MPI, you should use
this option.
Building the extra Trillium executables and installing the
Trillium header files and man pages used to be the default in prior
versions of LAM/MPI. However, since few users actually used them, it
has been relegated to an option.
Example:
% ./configure --with-rpi=usysv --with-cc=/bin/cc \
--with-cflags=-O4 -without-fc
Compile for the usysv RPI using the C compiler /bin/cc with
options -O4 and disable Fortran support.
LAM has been verified as being 64 bit clean under Solaris 7, AIX
4.3.3, IRIX 6.5, and Alpha/Linux 2.2.x. To compile LAM with the 64
bit architecture, you will likely need to add compiler and linker
flags with configure. For example, if you are using the Solaris
Workshop 5.0 compilers on Solaris 7, you can use the following:
% ./configure --with-cflags='-xarch=v9' --with-ldflags='-xarch=v9'
Other compilers/architectures will have their own flags to enable
64 bit compilation; consult the documentation for your compiler. Of
course, you can also add in any debugging/optimization flags in the
cflags and ldflags strings as well.
Once the configuration step has completed, build LAM by doing:
% make
in the top level LAM directory. This will build the LAM binaries and
libraries within the distribution source tree. Once they have
compiled properly, you can install them with:
% make install
NOTE: Previous version of LAM included "make
install" in the default "make". THIS IS NO LONGER TRUE. You
must execute "make install" to install the LAM executables,
libraries, and header files to the location specified by the --prefix
option to configure.
LAM and the ROMIO and MPI-2 C++ packages all include example code
that can be built with a single top-level "make examples". Note that
the examples can only be built after a successful "make
install", and $prefix/bin has been placed in your $path.
% make examples
This will do the following (where TOPDIR is the top-level
directory of the LAM source tree):
- Build the LAM examples. They are located in:
TOPDIR/examples
- If LAM was configured to build the C++ examples (i.e., if you did
not configure with --without-mpi2cpp), the MPI 2 C++ examples
will be built. They are located in:
TOPDIR/mpi2c++/contrib
- If you configured LAM with ROMIO support (i.e., if you did not
configure with --without-romio), the ROMIO examples will be
built. See the notes about ROMIO in the RELEASE_NOTES file.
They are located in:
TOPDIR/romio/test
Additionally, the following three commands can be used to build
each of the packages' examples separately (provided that support for
each was compiled in to LAM) from TOPDIR:
% make lam-examples
% make romio-examples
% make mpi2c++-examples
A boot schema is a description of a multicomputer on which LAM
will be run. You can create boot schema files (see bhost(5) for
syntax) for typical configurations of the local multicomputer(s).
Place these files under etc/ in the installation directory. They will
be found by LAM tools such as lamboot(1), recon(1) and wipe(1) if you
do not specify a filename on the command line to use instead of the
default.
The default etc/lam-bhost.def file comes with a single line:
localhost
So that if you simply do "lamboot", you will get a LAM with one
node (the localhost) booted.
You can re-write the etc/lam-bhost.def file if you are frequently
going to boot LAM to the same configuration. For example, if you
frequently use 4 workstations: inky, blinky, pinky, and clyde, you can
have a etc/lam-bhost.def files as follows:
inky
blinky
blinky
blinky
blinky
pinky cpu=2
clyde user=lamrocks
Note that "blinky" is listed 4 times. This tells LAM/MPI that
blinky has 4 CPUs (relevant for the "C" notation to the mpirun
command; see mpirun(1)). An alternate (and equivalent) notation is
used for pinky -- "cpu=2" specifies that pinky has 2 CPUs.
You can also specify different remote usernames on the remote
nodes; the username "lamrocks" is used on the machine "clyde" in the
above example.
If the LAM installation directory is moved after it is built,
users must set the LAMHOME environment variable to the new location.
This is the only case where the LAMHOME environment variable
should be set -- otherwise, it should be left unset. See "The LAMHOME
and TROLLIUSHOME environment variables", below.
On each UNIX machine, users must add the LAM executable directory
to their shell's search path. LAM executables are found under
$prefix/bin. These steps must be taken on each and every machine that
might be part of a multicomputer running LAM. Set the variables in
the shell's start-up file, not the .login file.
LAM is a daemon-based implementation of MPI. This means that a
daemon process is launched on each machine that will be in the
parallel environment. Once the daemons have been launched, LAM is
ready to be used. A typical usage scenario is as follows:
- Boot LAM on all the nodes
- Run MPI programs
- Shut down LAM
LAM does not need to be booted in order to compile MPI programs.
LAM is a user-based MPI environment; each user who wishes to use LAM
must boot their own LAM environment. LAM is not a client-server
environment where a single LAM daemon can service all LAM users on a
given machine. There are no future plans to make LAM client-server
oriented (unless someone volunteers to write it :-).
As a side-effect of this design, each user must have an account on
each machine that they wish to use LAM on.
Note that it is typically not necessary to set the
LAMHOME and/or TROLLIUSHOME environment variables. These variables
are only necessary of the $prefix of the LAM installation is
moved after "make install" was run.
As such, there are very few cases when one would need to set
LAMHOME or TROLLIUSHOME. The LAM Team recommends that you leave these
variables unset.
The recon(1) tool checks if LAM can be started on the given boot
schema. There are several prerequisites that enable LAM to be started
on a remote machine:
- The machine must be reachable and operational.
- The user must have an account on the machine.
- The user must be able to rsh(1) to the machine (typically,
permissions must be set in the user's .rhosts file on the
machine).
- The user must be able to write to /tmp.
- The LAM executables must be locatable on that machine, using
the shell's search path and possibly the LAMHOME environment
variable, as described above.
- The shell's start-up script must not print anything on standard
error. The user can take advantage of the fact that rsh(1) will
start the shell non-interactively. The start-up script can exit
early in this case, before executing many commands relevant
only to interactive sessions and likely to generate output.
All of these prerequisites must be met before LAM will
function properly. If recon does not complete successfully, the "-d"
option will give verbose descriptions of what it tried to do, and
suggestions to fix the problem.
Also keep in mind that just because recon works, lamboot itself
may still fail. This usually happens when the "hboot" program (that
lamboot invokes on remote nodes) fails for some reason. Again, the
"-d" option to lamboot will enable extremely verbose output, and
suggest solutions to common problems.
Users should read the lam(7) manual page to get started using LAM
tools and libraries.
Additionally, the University of Notre Dame offers a "Getting
Started with LAM" tutorial, that, although somewhat biased towards the
LAM Team's computing environment, is a good starting point to getting
familiar with LAM.
http://www.lam-mpi.org/tutorials/lam/
A common environment to run LAM is in a Beowulf-class or other
workstation cluster. Simply stated, LAM can run on a group of
workstations connected by a network. As mentioned above, there are
several prerequisites, however (the user must have an account on all
the machines, the user can rsh [or ssh, or whatever other remote shell
transport capability is desired -- see above for how to change the
underlying remote shell transport] to all the machines, etc.).
This raises the question for LAM system administrators: where to
install the LAM binaries, header files, etc.? There are two main
choices:
- Have a common filesystem, such as NFS, between all the machines
to be used. Install the LAM files such that the LAM executables can
be found in the same directory on each node. This will
greatly simplify user's .cshrc/.profile scripts -- the value
of the $PATH can be set without checking which machine the user is on.
It also simplifies the system administrator's job; when the time comes
to patch or otherwise upgrade LAM, only one copy needs to be modified.
For example, consider a cluster of four machines: inky, blinky, pinky,
and clyde. If the LAM binaries et al. are installed on inky's local
hard drive in the directory /home/lam, the system administrator has
two main choices:
- mount inky:/home/lam on the remaining three machines, such that
/home/lam on all machines is effectively "the same". That is, the
following directories all contain the LAM binaries:
- inky:/home/lam
- blinky:/home/lam
- pinky:/home/lam
- clyde:/home/lam
- mount inky:/usr/local/src/lam-6.5.9 on all
four machines in some other common location, such as /home/lam (a
symbolic link can be installed on inky instead of a mount point for
efficiency). This strategy is typically used for environments where
one tree is NFS exported, but another tree is typically used for the
location of binaries. For example, the following directories all
contain the LAM binaries:
- inky:/home/lam
- blinky:/home/lam
- pinky:/home/lam
- clyde:/home/lam
Notice that there are the same four directories as the previous
example, but on inky, the directory is actually located in
/usr/local/src/lam-6.5.9. There is a bit of a
disadvantage in this approach; each of the remote nodes have to incur
NFS (or whatever filesystem is used) delays to access the LAM
directory tree. However, both the administration ease and low cost
(relatively speaking) of using a networked file system usually greatly
outweighs the cost.
- If you are concerned with networked filesystem costs of accessing
the LAM binaries, you can install LAM on the local hard drive of each
node in your system. Again, it is highly advisable to
install LAM in the same directory on each node so that user's
$PATH can be set to the same value, regardless of the node that a user
has logged on to.
This approach will save some network latency of accessing the LAM
binaries, but is only used where users are very concerned about
squeezing every spare cycle out of their machines.
AFS has some peculiarities, especially with file permissions when
using rsh. However, most sites tend to install the Transarc rsh
replacement (i.e., the one that passes tokens to the remote machine)
as the default rsh, so when you "rsh" to a remote machine (with recon
or lamboot), your AFS token will be passed to the remote LAM daemon
automatically. If your site does not install the Transarc replacement
rsh as the default, consult the documentation on "--with-rsh" (above)
to see how to set the path to the rsh that LAM will use.
Once you use the replacement rsh, you should get a token on the
other side. This means that your LAM daemons are running with your
AFS token, and you should be able to run any program that you wish,
including those that are not system:anyuser accessible. You will even
be able to write into your filespace (as you would expect).
Keep in mind, however, that AFS tokens have limited lives, and
will eventually expire. This means that your LAM daemons (and user
MPI programs) will lose their AFS permissions after some specified
time unless you renew your token (with the "klog" command, for
example) on the originating machine before the token runs out. This
can play havoc with long-running MPI programs that periodically write
out file results; if you lose your AFS token in the middle of a run,
and your program tries to write out to a file, it won't have
permission to, which may cause Bad Things to happen.
If you need to run long MPI jobs with LAM on AFS, it is usually
advisable to ask your AFS administrator to increase your default token
life time to a large value, such as 2 weeks.
Note that you can change the remote transport agent that LAM uses
to spawn the LAM daemons. While rsh is the default, it can be changed
to other agents, such as ssh.
ssh is a popular choice because of the added security that it provides
over the .rhosts security provided by rsh. And since ssh can pass AFS
tokens, it presents an attractive, highly secure, yet
fully-AFS-authenticated method, for invoking LAM.
If you choose to use ssh, the 1.x series of ssh will require the
use of the "-x" command line flag to prevent ssh from printing its
standard banner information to stderr. lamboot/recon/etc. interprets
information on stderr to mean that a remote invocation has failed;
ssh's "-x" will prevent this. (We do not have access to SSH 2.x
clients -- they may require a similar command line flag).
Note that using ssh (or any other agent) only changes the way that
LAM is invoked. Once LAM is invoked, it sets up its own
sockets for communication that are outside of ssh (and are therefore
not encrypted). ssh provides stronger security only during lamboot
and wipe. Once the LAM daemons are launched, all MPI meta information
is passed through separate channels (such as startup of user programs)
which are independent of ssh.
It is highly recommended that you execute the following steps
in order. Many people have similar problems with
configuration and initial setup of LAM, and most common problems have
already been answered in one way or another.
- Check the LAM FAQ:
http://www.lam-mpi.org/faq/
- Check the mailing list archives. Use the "search" features to
check old posts and see if others have asked the same question and had
it answered:
http://www.lam-mpi.org/MailArchives/lam/
- If you do not find a solution to your problem in the above
resources, and your problem specifically has to do with
building LAM, send the following information to the LAM
mailing list (see the next section below about sending mail to the LAM
mailing list):
- The result of "uname -a" on your system
- The result of "./config/config.guess" from the top-level LAM source
directory.
- Output from when you ran "./configure" to configure LAM
- The config.log file from the top-level LAM directory
- The share/include/lam_config.h file
- Output from when you ran "make" to build LAM
To capture the output of the configure and make steps you can use
the script command or the following technique if using a csh style
shell:
% ./configure {options} |& tee config.LOG
% make install |& tee make.LOG
or if using a Bourne style shell:
% ./configure {options} 2>&1 | tee config.LOG
% make install 2>&1 | tee make.LOG
There are two mailing lists: one for LAM/MPI announcements, and
another for questions and user discussion of LAM/MPI.
- Announcement list.
This is a low-volume list that is used to announce new version of
LAM/MPI, important patches, etc. To subscribe to the LAM announcement
list, visit its list information page (you can also use that page to
unsubscribe or change your subscription options):
http://www.lam-mpi.org/mailman/listinfo.cgi/lam-announce
- General discussion/user list.
This list is used for general questions and discussion of LAM/MPI.
User can post questions, comments, etc. to this list. Due to problems
with spam, only subscribers are allowed to post to the list. To
subscribe or unsubscribe from the list, visit the list information
page:
http://www.lam-mpi.org/mailman/listinfo.cgi/lam/
After you have subscribed (and received a confirmation e-mail),
you can send mail to the list at the following address:
YOU MUST BE SUBSCRIBED IN ORDER TO POST TO THE LIST
lam at lam dash mpi dot org
YOU MUST BE SUBSCRIBED IN ORDER TO POST TO THE LIST
NOTE: People tend to only reply to the list; if
you subscribe, post, and then unsubscribe from the list, you will
likely miss replies.
Also please be aware that lam at lam dash mpi dot org is a list that goes to
several hundred people around the world -- it is not uncommon to move
a high-volume exchange off the list, and only post the final
resolution of the problem/bug fix to the list. This prevents
exchanges like "Did you try X?", "Yes, I tried X, and it did not
work.", "Did you try Y?", etc. from cluttering up peoples' inboxes.
Check the LAM FAQ and mailing list archive resources mentioned in
the previous section (Problems with building LAM). If you do not find
the solution to your problem there, send mail to the LAM mailing list:
lam at lam dash mpi dot org.
Some typical problems with rsh include the following:
- Incorrect permissions on a user's home directory
- Incorrect permissions on $HOME/.rhosts
- No entry (or incorrect entry) in $HOME/.rhosts
Some typical problems with a user's environment include the
following:
- User's .cshrc/.profile does not put $prefix/bin in the path
- Inaccessible permissions on the program that you are trying to
run
- Inaccessible permissions on the /tmp directory
When using the sysv or usysv RPIs, the operating system may run
out of shared memory and/or semaphores. This is typically indicated
by failing to run an MPI program, or failing to run more than X copies
of an MPI program on a single node.
To fix this problem, your operating system settings need to be
modified to increate the allowable shared semaphores/memory.
For Linux, teconfiguration can only be done by building a new
kernel. First modify the appropriate constants in
include/asm-[arch]/shmparam.h or include/linux/shm.h. Increasing
SHMMAX will allow larger shared segments and increasing _SHM_ID_BITS
allows for more shared memory identifiers (this information is likely
from 2.0/2.2 linux kernels; it may or may not have changed in more
recent versions).
For Solaris, reconfiguration can be done by modifying /etc/system and
then rebooting. See the Solaris man page system(4).
For example to set the maximum shared memory segment size to 32 MB
put the following in /etc/system:
set shmsys:shminfo_shmmax=0x2000000
If you are using the sysv transport and are running out of
semaphores then the following tunables can be set.
set semsys:seminfo_semmap=32
set semsys:seminfo_semmni=128
set semsys:seminfo_semmns=1024
Please consult your system documentation for help in determining
the correct values for your systems.
After LAM has been built, all of the objects can be removed by
running the make(1) utility with the "clean" target in the source
directory.
% make clean
NOTE: If you are using a really picky version of
make (such as OpenBSD's make), you may need to use "make -i clean".
If you're really desperate for more space, a bit more
space can be reclaimed by running:
% make distclean
NOTE: Again, if you are using a really picky
version of make (such as OpenBSD's make), you may need to use "make -i
distclean".
If further space is required, the entire source directory can be
taken off-line (indeed, "make distclean" returns the LAM source tree
to the same state as it was when it was unpacked from the original
distribution tarball). Only the installation directory need be
maintained on-line.
There are various constants defined in the LAM header files which
relate to message transfer protocols, shared memory allocation, and so
on. Some of these are configurable via the configure script; it is
hoped that in time, more and more options will be configurable.
This section is intended to describe some of these constants so
that LAM users can experiment with tuning the MPI library. It also
provides some description of the transport layer internals which may
help LAM users better understand the behavior and performance they see
from the LAM MPI library.
LAM MPI uses a short/long message protocol. If a message is
"short", it is sent together with a header in one transfer to the
destination process. If the message is "long", then a header
(possibly with some data) is sent to the destination. The sending
process then waits for an acknowledgment from the receiver before
sending the rest of the message data. The receiving process sends the
acknowledgment when a matching receive is posted.
The crossover point from "short" to "long" message is configurable
in each transport. See the transport specific section tcp, sysv, or
usysv for further information.
Typically, when a message is sent or received, LAM creates a
request structure, fills it with information about the message, links
the request into a list of messages, and calls a progression "engine"
to effect the data transfer.
When there are no active requests and a blocking (standard mode)
send or receive is done, the overhead of creating the request and
linking it into the list can be bypassed (shortcircuited) and the
progression "engine" called directly to effect the transfer.
In prior versions of LAM/MPI, this option was not the default. It
is now used by default, unless specifically disabled via the configure
script.
The crossover point from "short" to "long" message is configurable
via the constant TCPSHORTMSGLEN in share/include/lam_config.h
(relative to the top of the LAM build tree). It can also be set from
the configure script via the --with-tcp-short option. The default is
64KB.
This number is relevant to all the RPIs. The shared memory RPIs
are multi-protocol; they will use LAM/MPI use TCP to communicate with
ranks that are not on the same node.
Descriptions of the usysv and sysv transports can be found in the
"RPI transport layers" section of the RELEASE_NOTES file.
Configuration constants for the usysv and sysv transports are
found in share/include/rpi.shm.h (from the top of the LAM build
directory).
In these transports, processes on different nodes communicate via
TCP sockets. The crossover point from "short" to "long" messages for
these communications is configurable via the constant TCPSHORTMSGLEN.
It can also be set from the configure script via the --with-tcp-short
option. The default is 64KB.
Processes located on the same node communicate via shared memory.
The transport allocates one SYSV shared segment shared by all
processes in the tasks which are on the node. This segment is
logically divided into two areas.
The "postbox" area contains postboxes for "short" message
communication. A postbox is used for communication one-way between
two processes. The space allocated per postbox is SHMSHORTMSGLEN +
CACHELINESIZE. SHMSHORTMSGLEN is configurable (via the configure
option --with-shm-short). It is the the crossover point from "short"
to "long" messages in shared memory communication; the default value
is 8 KB.
CACHELINESIZE must be the size of a cache line or a multiple
thereof. The default setting is 64 bytes. You shouldn't need to
change it. CACHELINESIZE bytes in the postbox are used for a
cache-line sized synchronization location.
The size of the postbox area is np (np-1) (SHMSHORTMSGLEN +
CACHELINESIZE) bytes.
The rest of the shared memory area is used as a global pool from
which space for long message transfers is allocated. Allocation from
this pool is locked. The default lock mechanism is a SYSV semaphore
but the configure option --with-pthread-lock can be used to change
this to a process shared pthread mutex lock. The size of this pool is
configurable via the constant LAM_MPI_SHMPOOLSIZE, and by the
configure option --with-shm-poolsize.
The configure script will try to determine a size for the pool if
none is explicitly specified. You should always check this to see if
it is reasonable. Larger values should improve performance especially
when an application passes large messages, but will also increase the
system resources used by each task.
The total size of the shared segment allocated is 2 CACHELINESIZE
+ LAM_MPI_SHMPOOLSIZE + np (np-1) (SHMSHORTMSGLEN + CACHELINESIZE).
The 2 CACHELINESIZE bytes are for the global pool lock.
When a message larger than 2 SHMSHORTMSGLEN is sent, the transport
sends SHMSHORTMSGLEN bytes with the first packet. When the
acknowledgment is received, it allocates (message length -
SHMSHORTMSGLEN) bytes from the global pool to transfer the rest of the
message.
To prevent a single large message transfer from monopolizing the
global pool, allocations from the pool are actually restricted to a
maximum of LAM_MPI_SHMMAXALLOC bytes. Even with this restriction, it
is possible for the global pool to temporarily become exhausted. In
this case, the transport will fall back to using the postbox area to
transfer the message. Performance will be degraded, but the
application will progress.
LAM_MPI_SHMMAXALLOC is configurable via the configure option
--with-shm-maxalloc or editing rpi.shm.h.
The usysv and sysv transports differ only in the mechanism used to
synchronize the transfer of messages via shared memory. The usysv
transport uses spin locks with back-off, while the sysv transport uses
SYSV semaphores.
Both transports use a few SYSV semaphores for synchronizing the
deallocation of shared structures or for synchronizing access to the
shared pool.
The usysv transport should be superior to the sysv transport on
multiprocessors. On uniprocessors, which is better depends on the OS
and the means used for processor yielding. On a Linux uniprocessor,
for example, using semaphores (sysv transport) appears to be vastly
superior to spin-locking.
The usysv transport uses spin locks with back-off. When a process
backs off, it attempts to yield the processor. If the configure
script found a system provided yield function such as yield() or
sched_yield(), this is used. If no such function is found, then
select() on NULL file descriptor sets with a timeout of 10us is used.
The use of select() to yield can be forced by the
--with-select-yield option to the configure script.
The sysv transport allocates a semaphore set (of size 6) for each
process pair communicating via shared memory. On some systems, you
may need to reconfigure the system to allow for more semaphore sets if
running tasks with many processes communicating via shared memory.