Table of contents:
- Is there information about how LAM works internally?
- Why does LAM use these annoying daemons?
- But no other MPI implementation uses daemons...?
- Is LAM multi-threaded?
- Is LAM thread safe?
- Does LAM/MPI provide asynchronous message passing progress?
- How does I/O work in LAM?
- Can I have access to the LAM code repository?
- Is LAM Y2K compliant?
[ Return to FAQ ]
|
1. Is there information about how LAM works internally? |
Yes and no. Starting with v7.0, LAM has transformed into a component
architecture -- many of its services are performed by components that
are tied together by a back-end framework. While many portions of the
LAM run-time environment and the MPI communication layer are part of
this framework (and not components in themselves, and therefore are
not documented), the component types are all formally documented.
Specifically, API documents for each of the component types are
available on the LAM download site.
These document how each component works and provides insight into how
the overall framework functions. Other than that, and other than
other FAQ questions [mainly in this section], there is little
additionaly information in the way of formal documentation
on how LAM works internally.
Additionally, in the LAM download area is a paper entitled "The
XMPI API and trace file format" (filename xmpi_api.ps)
that details both how XMPI extracts run-time trace information from
LAM, as well as the format of tracefiles that LAM can produce with the
-ton command line option to mpirun. The
format of the file and the format of the data that XMPI receives is
the same; the mechanism for obtaining the two is slightly different.
There are some other papers in the download area that discuss some
of LAM's internals, but they are pretty much broad overviews of the
techniques that LAM uses.
There are, however, a bunch of manual pages on the Trollius
library API calls (recall that LAM/MPI is a layer on top of an
underlying message passing system named Trollius) included in the LAM
distribution. Look in sections 2 and 3 of the LAM man pages for more
information on these calls.
Additionally, the entire source code to LAM is provided in the
download distribution. Feel free to "source dive" and investigate how
LAM works yourself. This is probably the most reliable way to get
information, unfortunately.
The current LAM development tree is also available through
anonymous CVS. See the LAM CVS web page
for details.
The LAM Team may be able to answer some questions about the LAM
implementation, but will more than likely only be able to direct you
to relevant parts of the LAM source code (LAM itself is very large --
there are hundreds and hundreds of C and C++ source files) where you
can find
specific answers.
[ Top of page | Return to FAQ ]
|
2. Why does LAM use these annoying daemons? |
A common complaint about LAM/MPI is that it uses user-level daemons;
particularly in batch queue or other automatic execution environments,
some consider it inconvenient to launch and kill the LAM run-time
environment.
There are many good reasons that LAM/MPI uses daemons, and we are
in no hurry to get rid of them. Indeed, all major implementations of
MPI now include daemons of one flavor or another (yes, even that other
open source MPI implementation!). The fact is that some kind of
external agent is necessary for some MPI-2 functionality such as
MPI_COMM_SPAWN. LAM/MPI uses the daemons for other kinds
of functionality as well, including:
- Having an external agent that has already satisfied
security/authentication requirements allows for fast request execution
(e.g., mpirun).
- A meta-network of daemons does its own monitoring and can
guarantee cleanup when a user aborts an MPI job with Ctrl-C and/or
MPI_ABORT
- Third party monitoring tools such as XMPI can tap into the deamon
meta-network to provide external monitoring of running jobs.
Additionally, starting with v7.0, the mpiexec command
can be used for "one-shot" MPI executions -- it will seamlessly boot
the LAM run-time environment, run the MPI process, and then take down
the LAM run-time environment. This hides all the background work from
the user.
We consider this functionality to be essential for a
robust parallel run-time environment, and therefore the gains from
this functionality greatly outweighs any inconvenience of
starting and stopping the LAM daemons.
Some have asked why LAM/MPI doesn't have a root-level daemon so
that users don't have to startup their own daemons with
lamboot. If we did that, we'd have to include an
authentication mechanism that, once satisfied that an incoming request
is a request from a valid MPI user (many people run LAM/MPI on open
networks, so this security/authentication is necessary), it would then
allow the requested action to be performed (perhaps fork off an MPI
program, or a user-specific MPI daemon for general use, or whatever).
However, this is exactly what rshd (or
sshd) already is. We don't see the need to duplicate
security/authentication functionality in an MPI implementation,
particularly when many robust, peer-reviewed solutions already exist.
The LAM Team believes that rshd and sshd
already supplies this functionality well (root-level authentication
and launching user-specific commands). There's no need for us to
recreate this functionality and risk showing up on Bugtraq.
In batch queue and other kinds of automatic execution
environments, the setup and teardown of the LAM daemons can be
automated and hidden from the user(or the user can use
mpiexec with the -boot and related options).
Batch systems typically have inherent "setup" and "teardown" steps
where the lamboot and lamhalt can be hidden
if the administrator/user desires. These steps can also be used to
guarantee the teardown for failure cases. Solutions for this have
(examples of PBS epilogue scripts) been posted on the LAM user list,
for example.
[ Top of page | Return to FAQ ]
|
3. But no other MPI implementation uses daemons...? |
Actually, that's not true.
Even "that other freeware MPI implementation" uses daemons now; it's not the default, but daemons are there. All vendor MPI implementations use some kind of daemons as well (most are typically root-level, though, and have internal security/authentication mechanisms).
MPI-2 functionality such as MPI_COMM_SPAWN and MPI_NAME_PUBLISH require external agents. Multi-threaded MPI implementations may be able to avoid some of these issues, but it's just a heckuvalot simpler to have an external agent (i.e., a daemon).
[ Top of page | Return to FAQ ]
|
4. Is LAM multi-threaded? |
No, LAM is not multi-threaded. The message passing engine is
single-threaded, both in the stubs that are compiled into user LAM/MPI
programs as well as the LAM message passing daemons.
[ Top of page | Return to FAQ ]
It depends on exactly what you mean by "thread safe".
LAM is "thread safe" in that MPI programs may use multiple threads.
It is not safe, however, to have multiple threads
simultaneously executing in the MPI library. If user programs
utilized multiple threads, they must ensure that only one thread uses
LAM at a time. Unpredictable results (read: crash and burn) will
occur if multiple threads access LAM simultaneously.
There are plans to make LAM have the ability to allow multiple
threads simultaneously executing within the MPI library. This will
take quite some time, however.
See the related question in the "Typical Setup of LAM" section of
the FAQ about how to use LAM in multi-threaded programs.
[ Top of page | Return to FAQ ]
|
6. Does LAM/MPI provide asynchronous message passing progress? |
Yes and no.
True asynchronous message passing progress depends on the ability to have threads
inside of LAM/MPI -- a hidden thread could continue to make progress on sending and
receiving, regardless of what the user application is doing. Since LAM/MPI is single
threaded, progress on the non-blocking calls such as MPI_ISEND occurs
only when the
user calls into the MPI library again to check for progress on these calls. Hence, there may
be no progress on an MPI_ISEND unless some flavor of
MPI_TEST or MPI_WAIT is
invoked (or other selected MPI communications functions).
That being said, LAM does use "eager" message sending protocols for "short"
messages, where the exact definition of "short message" is different in each RPI -- it is
usually messages under a specific length (see the LAM/MPI User's Guide for a description of each of LAM's underlying message
passing devices). For example, by default, LAM's TCP RPI will eagerly try to send messages
under a 64K (note that this default value is changable at compile- and run-time). That is,
LAM
will try to send the entire message during a call to MPI_ISEND.
Depending on current levels of operating system buffering, the entire message may be
sent immediately (or, more specifically, may be copied out of the process space into the
TCP stack's internal buffers). Hence, it is possible to get some level of asynchronous
progress when using short messages because of eager protocols. Keep in mind,
however, that operating system and/or device buffering is finite, so it a lot of short
messages are sent (perhaps
without corresponding receives on the reciver), it is possible that even eager sends will not
be sent immediately (e.g., MPI_ISEND is not able to send the message in
the first pass through the progression engine, and progress will only be made at the next
call to some flavor of MPI_TEST or MPI_WAIT).
Note that some networks provide some support for independant
progress. Myrinet and Infiniband, for example, have communication co-processors --
LAM simply gives a message to be sent to the co-processor and then returns control to
the user application. The co-processor can move the message across the network
independent of the user application behavior (i.e., "in the background"). However, LAM/
MPI will not recognize that this has happened until MPI_TEST or
MPI_WAIT (or other MPI
communications function) has been invoked. Additionally, LAM's internal flow control and
signaling will not occur until LAM's progression engine is invoked (during
MPI_TEST,
MPI_WAIT, and other communication functions). So even if an
MPI_ISEND is invoked on
a network with a local communication co-processor, there may only be limited progress
while execution is outside of LAM's progress engine.
Also, it should be noted that the lamd RPI does provide true
asynchronous message passing progress -- at a cost. The lamd RPI
immediately passes all MPI messages to the local LAM daemon. The messages are passed
from the local deamon to the receiving process' daemon (note that if the sending and
receiving processes are on the same node, the message will stay in the local daemon) and
are "ready for pickup" when the receiver process posts a corresponding receive. Simply
put, all messages are sent eagerly to the local LAM daemon, and the daemons provide "in
the background" progress for moving messages across the network. Remember that the
daemons are a different process, and can therefore make progress on message passing
while the user's application is not in the MPI library.
As such, the lamd RPI definitely offers asynchronous message
passing, but at the cost of added latency for two extra hops (from the sender process to
the local LAM daemon, and from the receiver's LAM daemon to the receiver's process).
Even with this additional latency, some applications can greatly benefit from the
asynchronicity -- some users have reported on the LAM mailing list seeing large overall
speedups of their applications using the lamd RPI.
[ Top of page | Return to FAQ ]
|
7. How does I/O work in LAM? |
In the interests of scalability (and speed starting applications), LAM
does not construct a TCP connection from every process back to the
user's terminal. Instead, LAM provides scalable remote I/O via the
LAM daemon processes and redirection.
Local processes (i.e. those on the node on which mpirun is invoked)
inherit stdin, stdout and stderr from mpirun.
Processes on remote nodes have their stdout and stderr redirected
to that of mpirun, and stdin is redirected to /dev/null.
[ Top of page | Return to FAQ ]
|
8. Can I have access to the LAM code repository? |
Yes, because the LAM Team tries very hard to release stable and as-bug-free-as-possible
distributions, we tend to take a long time between major releases. However, there are
many useful new features (and bug fixes) in our internal Subversion repository that some
users have asked for access to. Additionally, for those who are actually develop with the
internals LAM/MPI, Subversion access gives the most up-to-date versions rather than the
periodic tarball access. As such, the LAM Team has decided to provide read-only access to
the LAM/MPI Subversion repository.
Be aware, however, that the Subversion checkouts are not guaranteed to be stable. For
the most part, we try very hard to not check in things that are broken, but this is an active
development tree -- bugs happen. This is actually another major reason that this tree has
been made available: peer review. If you find any bugs, please report them! Contributions,
suggestions, and comments are welcome.
To checkout the Subversion development tree, see the LAM
Subversion web page.
[ Top of page | Return to FAQ ]
LAM is mostly Y2K compliant in the sense that (for the most part) LAM doesn't care what time it is.
However, there are at least 2 MPI functions that report the
time/date: MPI_Wtime() and MPI_Wtick().
These functions directly report whatever the underlying operating
system tells them is the current date and time. Hence, if the
underlying OS is not Y2K compliant, these two functions may report
inaccurate information. LAM cannot do anything to fix this.
Also, the LAM tracing mechanism depends on the time (as reported by the underlying operating system). That is, analyzers such as XMPI use the tracing information reported by LAM to visually display communication patterns. If the time that is reported by the underlying OS is incorrect, this trace information will also likely be incorrect.
In short, LAM does not specifically store the year anywhere in its source code. The only times that LAM uses are a conglomerate of the month, day, year, hour, minute, and second, as reported by the underlying OS. LAM specifically uses the tv_sec and tv_usec members of the structures returned by the gettimeofday library call.
Additionally, other OS functions may depend on the system date and
time. If the underlying OS is not Y2K compliant and internal time
functions get hosed, LAM may behave unpredicably (because the OS is
behaving unpredictably). LAM cannot do anything about this as well
(chances are that LAM will be the least of your problems at this
point, anyway).
Finally, normal disclaimers apply (see the disclaimer statement in
the LICENSE file in the LAM distribution).
[ Top of page | Return to FAQ ]
|