[ Introduction |
Preliminary setup |
Compiling MPI programs |
Booting LAM/MPI |
Running MPI programs |
Shutting down LAM/MPI ]
5. RUNNING LAM
5.1 mpirun
SPMD
To run your MPI binaries, use the command mpirun. For
example, to run the sample program presented above (assuming that the
binary is called ``hello''),
shell$ cd /afs/nd.edu/path/to/where/your/program/is
shell$ mpirun C hello
The C means "run hello on every CPU that you
lambooted on." Alternatively, if you only want to run one copy of
hello on every node, use N instead of
C.
mpirun has many more options that can be supplied -- see
the mpirun(1) man page and/or the output of
"mpirun -help" for more details.
BUT WAIT! There's an
easier way. :-) The lam_cshrc script that you sourced
defines an alias named lamrun, which automatically
supplies several common command line parameters to mpirun
as well as the present working directory for your program. So instead
of doing:
shell$ mpirun -O C /afs/nd.edu/...your_path.../hello
You can do:
shell$ lamrun hello
NOTE: The ND environment
has a "killer" program that will kill programs that are not owned by
the person logged on to the console of a machine that have run for
more than 1 CPU minute. That is, your LAM daemons can be killed
without notice. So when you try to "mpirun" a program, it just seems
to hang. The tping command is very
useful in determining if all your LAM daemons are still operating. If
they are not, you need to perform a wipe and another lamboot.
MPMD
Although it is common to write SPMD code, LAM can also handle the
MPMD style of executing programs as well (i.e., execute different
binaries on each rank).
Instead of giving mpirun the name of a single binary,
you give mpirun the name of an application
schema file. The application schema (or "appschema") simply
lists the nodes that you want to use, and the name of the binary to
execute on each (along with any relevant command line options that
your binary may require).
For example, the following appschema starts master on
n0, and starts slave on all the other nodes
(n1-7, in this case). Note that we're passing some flags
to the slave program, too:
n0 master
n1-7 slave -verbose -loadbalance
To run this appschema, you still use mpirun, but no longer
need to specify nodes or an application name -- you simply specify the
appschema file name (let's say that the above example's file name is
esha-homework):
shell$ mpirun esha-homework
This will start the respective binaries on their respective nodes.
5.2 mpitask
Anologous to the sequential UNIX ps command is
mpitask which displays the current status of the MPI
program(s) being executed. The -h command line option
provides brief synopsis for this command.
5.3 mpimsg
Similar to the mpitask command, the mpimsg
command gives information about running MPI programs. mpimsg
shows all pending messages in the current MPI environment. With
mpimsg , you can see messages that are "left over" (i.e.
messages that are never received) even after your MPI program has
completed.
This command is not very useful if you are running in the
"client-to-client" mode in LAM/MPI (which is the default). You must
specifically say -lamd on your mpirun
command line for this command to work as expected.
REMEMBER: Correct MPI programs do not leave messages
lying around; all messages should be received during the run of your
program.
5.4 lamclean
To kill the running MPI program and erase all pending messages, use
lamclean:
shell$ lamclean -v
NOTE: lamclean should only have to be
used for debugging -- i.e. programs that hang, messages that are left
around, etc. Correct MPI programs should terminate properly and clean
up all their messages.
[ Introduction |
Preliminary setup |
Compiling MPI programs |
Booting LAM/MPI |
Running MPI programs |
Shutting down LAM/MPI ]
|