LAM/MPI logo

LAM FAQ: Debugging MPI programs under LAM/MPI

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just the FAQ
Table of contents:
  1. Are MPI programs truly portable?
  2. How does LAM launch user programs?
  3. How do I debug LAM/MPI programs?
  4. How do I launch a debugger for each rank in my process?
  5. How can I get a separate X window for each rank?
  6. What environment variables does LAM define?
  7. Can I run MPI programs with memory-checking tools such as bcheck, valgrind, or purify?
  8. Is LAM purify clean?
  9. Why does my memory-checking debugger report memory leaks in LAM?
  10. Why does my memory-checking debugger report "read from uninitialized" in LAM?
  11. What does the error message "One of the processes started by mpirun has exited with a nonzero exit code" mean?
  12. What does the error message "MPI_[function]: process in local group is dead (rank [N], MPI_COMM_WORLD)" mean?
  13. My application deadlocks in LAM/MPI; it doesn't deadlock in other MPI implementations. Why?

[ Return to FAQ ]


1. Are MPI programs truly portable?

Well, yes and no.

All conformant MPI programs will compile with any conformant MPI implementation. That is, if you write a correct MPI program, it should compile just about anywhere (most [if not all] major MPI implementations have the correct MPI API), so cross-compilation -- in terms of MPI calls -- is not much of an issue.

Indeed, we have "ported" several large scale MPI programs to multiple architectures with different MPI implementations without much trouble.

The catch is that every MPI implementation is not created equal. The MPI standard was written with some very specific points, and some very loose points -- most of which were on purpose. That is, the standard does leave some leeway for the implementor to choose exactly what (and how) specific actions are to be performed. So even though your program will most likely compile under all existing MPI implementations, it may work slightly differently with different implementations.

This is not a major concern, but it is something that the MPI programmer needs to be aware of. Indeed, most MPI-1 implementations are reasonably similar. However, several MPI-2 functions, for example, take MPI_Info arguments, which are specifically designed to be implementation-dependant. This means that even though the call to, for example, MPI_COMM_SPAWN is completely portable, the building of the MPI_Info argument for that call is not.

For that reason, LAM/MPI defines the preprocessor macro LAM_MPI to be 1. MPI programmers can use this for LAM-specific code, if necessary. For example:

#if LAM_MPI
  /* Do LAM-specific things here */
#endif
  MPI_Comm_Spawn(.....);

[ Top of page | Return to FAQ ]


2. How does LAM launch user programs?

This question is relevant to debugging; understanding the basics of how LAM launches uses programs will help in debugging by allowing you to take advantage of some of LAM's features with shell scripts.

When you use mpirun to invoke a program on a remote node (either with an app schema, or by a simple command line invocation), the command (and any arguments) to be executed is sent to the remote LAM daemon.

The remote LAM daemon does the following (in no particular order):

  • Forks a new Unix process
  • Sets any exported environment variables from mpirun
  • Sets several environment variables (mainly for internal use) that contain all the relevant MPI state information -- to include all run-mode command line switches to mpirun (this is how LAM initializes your process without adding extra command line arguments)
  • Redirect stdin and stdout
  • exec's the user's command (with all command line arugments)

The process runs until it (or one of its children) executes MPI_INIT. MPI_INIT will perform some communication with mpirun to obtain the location and identification of all other ranks in MPI_COMM_WORLD. Once each rank knows about all other ranks, if the program was launched with C2C mode, each rank will perform a "dance" to obtain a direct socket to each other rank.

Once all of this has been accomplished, MPI_INIT returns and the user program progresses as normal.

Hence, if you use mpirun to launch non-MPI programs, mpirun will hang while waiting for the non-MPI program to send back its location information.

[ Top of page | Return to FAQ ]


3. How do I debug LAM/MPI programs?

mpirun can be used to launch non-MPI programs (as long as the programs that you run eventually launch LAM/MPI programs).

Reading between the lines, this means that you can use mpirun to launch debuggers on remote nodes. This is especially helpful to find race conditions, memory problems, etc., that were previously very difficult to find because LAM was not "debugger friendly".

Many users still prefer to use printf-style debugging (i.e., insertting printf (C) or WRITE (FORTRAN) statements throughout their code), but this is haphazard and ends up littering your code with spurrious output, and can be a serious detriment to performance -- output to the screen is extremely slow in comparison to the FLOPS that a computer is capable of).

Speaking from experience, the LAM Team has found the use of debuggers to be extremely helpful in debugging MPI programs. We highly recommend it over printf debugging (which is why we built in the ability to have mpirun execute non-LAM/MPI programs).

[ Top of page | Return to FAQ ]


4. How do I launch a debugger for each rank in my process?
Applies to LAM 6.3 and above

Since all ranks except rank 0 have their stdin tied to /dev/null, it is necessary to launch text-based debuggers (such as gdb) in separate X windows. If you are using a GUI-based debugger, you can simply mpirun that debugger directly on each node.

For GUI debuggers, you will probably need to export the DISPLAY environment variable.

NOTE: If you are using the rsh boot SSI module with the ssh remote agent, you cannot use SSH's default X forwarding. This is because SSH's X forwarding only exists while ssh is running, but ssh will have completed and exited normally before a successful lamboot completes. Hence, you must generate your own DISPLAY that is suitable for remote nodes to write to your display.

For text debuggers, you will need a short shell script to launch an xterm (or whatever your favorite X window program is -- not all systems have xterm -- other terminal programs can be used instead, such as konsole, gnome_terminal, etc.). For example:

% mpirun N -x DISPLAY run_gdb.csh my_program_name

Where run_gdb.csh is a shell script, and my_program_name is the name of your LAM/MPI executable. An example run_gdb.csh is shown below:

#!/bin/csh -f

echo "Running GDB on node `hostname`"
xterm -e gdb $*
exit 0

Also note that the DISPLAY environment variable is exported to the remote nodes with mpirun. This is necessary so that the remote nodes know where to send the X display of the xterm. Be sure that the DISPLAY contents are suitable for sending to your display (e.g., setting it to "your_hostname:0" will be suitable on many systems) and that the host you are running on has remote access for X enabled. You may need to see the man pages for xauth(1) and/or xhost(1) for more information on remote X display authentication.

If you are not running in an X enviornment, or wish to debug only one process you can use a script such as:

#!/bin/csh -f

if ("$LAMRANK" == "0") then
  gdb $*
else
  $*
endif
exit 0

[ Top of page | Return to FAQ ]


5. How can I get a separate X window for each rank?
Applies to LAM 6.3 and above

Sometimes it is desirable to launch each rank's process in a different window. This allows separating of output (i.e., each rank's output will therefore be in a separate window), the use of stdin on each rank, etc. This can be especially handy when debugging LAM/MPI programs.

You will need a short shell script to accomplish this. For example, the following example script is named run_xterm.csh:

#!/bin/csh -f

echo "Running xterm on `hostname`"
xterm -e $*
exit 0

This will run an xterm with the specified arguments as the command running in that window (note that not all systems have xterm -- other terminal programs can be used instead, such as konsole, gnome_terminal, etc.). The following mpirun command can be used to launch this script:

% mpirun C -x DISPLAY run_xterm.csh my_mpi_program

Note how my_mpi_program is given as an argument to run_xterm.csh, which is then invoked in the xterm line in the script.

Also note that the DISPLAY environment variable is exported to the remote nodes with mpirun. This is necessary so that the remote nodes know where to send the X display of the xterm. Be sure that the host you are running on has remote access for X enabled. You may need to see the man pages for xauth(1) and/or xhost(1).

NOTE: If you are using the rsh boot SSI module with the ssh remote agent, you cannot use SSH's default X forwarding. This is because SSH's X forwarding only exists while ssh is running, but ssh will have completed and exited normally before a successful lamboot completes. Hence, you must generate your own DISPLAY that is suitable for remote nodes to write to your display.

[ Top of page | Return to FAQ ]


6. What environment variables does LAM define?
Applies to LAM 6.3 and above

Before executing user programs, LAM defines several environment variables to be inherited by the user process. While the majority of the variables are only meaningful inside of LAM, the LAMRANK environment variable may be useful to the user.

The LAMRANK variable will contain a number from 0 to (n-1), and indicates what rank the process will be in MPI_COMM_WORLD. This variable can be used to make decisions at execution time, especially if a shell script is launched via mpirun. Consider the following shell script:

#!/bin/csh -f

# $* will contain the name of the executable to run, as well as
# all the arguments that were passed in from mpirun

# This will run the user program (with all arugments from mpirun
# and direct the output to the files "mpi_output.0" through
# "mpi_output.(n-1)"

$* > mpi_output.$LAMRANK

exit 0

Also note from this shell script that to launch LAM/MPI executables from within shell scripts that have been launched from mpirun, you just execute them. Do not use mpirun from within the script!

[ Top of page | Return to FAQ ]


7. Can I run MPI programs with memory-checking tools such as bcheck, valgrind, or purify?

Yes. Since LAM allows you to mpirun non-MPI programs, you can either mpirun bcheck or valgrind directly, or write a short shell script to perform some "smart" execution decisions to limit your output. For example, the following script will only invoke bcheck (the Solaris native memory-checking debugger) on rank 0, and ensure that the output report files are in a specific directory:

#!/bin/csh -f

# Only have rank 0 execute bcheck.  The LAMRANK environment
# variable contains a number from 0 to (n-1).

if ($LAMRANK == "0") then
  # Make a directory based upon the host name
  set host=`hostname`
  if (! -d $host) mkdir $host
  cd $host
  bcheck -all $*
else
  # If we are not rank 0, just run the executable (and all of
  # its arguments)
  $*
endif

exit 0

Purify is slightly different -- the purify command must be used to compile the actual MPI application. Unfortunately, it seems that at least some versions of Purify don't understand the LAM wrapper compilers (mpicc, mpiCC, and mpif77). Hence, the typical solution is to have the LAM wrapper compilers invoke purify (instead of the other way around). Specifically, the following won't work:

  shell$ purify mpicc my_application.c -o my_application
Instead, tell the LAM wrapper compilers to use purify as the underlying compiler. For example, to set the underlying compiler that the mpicc wrapper compiler uses, set the environment variable LAMHCP. For Bourne-like shells:

  shell$ LAMMPICC="purify cc"
  shell$ export LAMMPICC
  shell$ mpicc my_application.c -o my_application
For csh-like shells:

  shell% setenv LAMMPICC "purify cc"
  shell% mpicc my_application.c -o my_application
The LAMMPICXX and LAMMPIF77 environment variables can be used to override the underlying compilers for the mpiCC / mpic++ and mpif77 wrapper compilers, repestively.

Note that the older (deprecated) environment variable names LAMHCC, LAMHCP, and LAMHF77 also still work for version 7.0 and above; these are the only names that work prior to version 7.0.

WARNING: Do not arbitrarily change the back-end compiler in the wrapper compilers; Badness can occur (read: seg faults and other strange behavior in MPI applications) if you arbitraily mix vendor compilers. For example, this kind of behavior can occur if LAM was configured and compiled with one compiler and you change the back end of the wrapper compilers to use a different set of compilers.

[ Top of page | Return to FAQ ]


8. Is LAM purify clean?

When compiled with the --with-purify option to configure, LAM 6.3 is purify clean (--with-purify is not the default for configure because it is a slight performance hit inside of LAM). LAM will function correctly with or without --with-purify.

[ Top of page | Return to FAQ ]


9. Why does my memory-checking debugger report memory leaks in LAM?

As far as we know, we have plugged all memory leaks in the LAM code. However, there are a few leaks from various operating system calls that we can't do anything about (for example, getpwuid() on Solaris 2.6 leaks a few bytes).

If you find any other memory leaks, please let the LAM Team know so that they can be fixed in future releases.

[ Top of page | Return to FAQ ]


10. Why does my memory-checking debugger report "read from uninitialized" in LAM?

LAM has a standard message structure that is uses for most internal communications. This structure has several fields that are not used for all types of communications. In situations where fields are not used, they are not intialized for the sake of optimization. So when the message is sent, the entire message structure is sent - to include the unintialized values. This is not a problem for LAM, because the receiver will ignore these fields. But it does generate "read from unintialized" warnings on the sending side when using memory-checking debuggers.

The --with-purify option to the LAM configure script will enable code within LAM to zero out all message structures before they are used. This must be selected at compile time, because the code that zeros out structures is conditionally compiled into LAM (it is a compile-time decision, not a run-time decision).

When using LAM with the --with-purify option, this may cause a slight performance hit, particularly when using the shared memory RPI's. Most users won't notice the extra overhead though, since zeroing LAM's internal message headers are a small constant size (i.e., the overhead is the same for a 1 byte messages as as 1MB message).

[ Top of page | Return to FAQ ]


11. What does the error message "One of the processes started by mpirun has exited with a nonzero exit code" mean?

This means that at least one MPI processes exited after invoking MPI_INIT but before invoking MPI_FINALIZE.

It typically indicates an error in the MPI application. LAM will abort the entire MPI application upon this error. The last line of the error message indicates the PID, node, and exit status of the failed process (note that there may be multiple failed processes -- LAM will only report the first one).

If this is happening to your application, it is recommended that you run your application through a memory checking debugger (such as Valgrind, Bcheck, or Purify) and look for buffer overflows, erroneous memory usage, or other kinds of subtle memory problems. Be sure to read the FAQ "Can I run MPI programs with memory-checking tools such as bcheck, valgrind, or purify?".

[ Top of page | Return to FAQ ]


12. What does the error message "MPI_[function]: process in local group is dead (rank [N], MPI_COMM_WORLD)" mean?

This means that some MPI function tried to communicate with a peer MPI process and discovered that the peer process is dead.

Common causes of this problem include attempting to communicate with processes that have failed (which, in some cases, won’t generate the “One of the processes started by [mpirun] has exited...” error message), or have already invoked MPI_FINALIZE. Communication should not be initiated that could involve processes that have already invoked MPI_FINALIZE. This may include using MPI_ANY_SOURCE or collectives on communicators that include processes that have already finalized.

[ Top of page | Return to FAQ ]


13. My application deadlocks in LAM/MPI; it doesn't deadlock in other MPI implementations. Why?

A common mistake for MPI application portabillity is assuming buffering on sends. This is described in detail in the MPI-1 standard.

Consider the following code.

if (rank == 0) {
  MPI_Send(..., 1, tag, MPI_COMM_WORLD);
  MPI_Recv(..., 1, tag, MPI_COMM_WORLD, &status);
} else if (rank == 1) {
  MPI_Send(..., 0, tag, MPI_COMM_WORLD);
  MPI_Recv(..., 0, tag, MPI_COMM_WORLD, &status);
}

When the messages are not buffered, rank 0's MPI_SEND does not complete until rank 1's MPI_RECV is posted. Similarly, rank 1's MPI_SEND does not complete until rank 0's MPI_RECV is not posted. This results in a deadlock. The only cases where this does not result in a deadlock is when the MPI implementation decides to buffer the MPI_SEND thereby allowing the posting of the MPI_RECV. However, the code will not be portable since this way of avoiding deadlocks is network, implementation, and potentially message size dependant.

There are several ways to fix this problem. Here are two:

  • Reverse the order of one of the send/receive pairs:

    if (rank == 0) {
      MPI_Send(..., 1, tag, MPI_COMM_WORLD);
      MPI_Recv(..., 1, tag, MPI_COMM_WORLD, &status);
    } else if (rank == 1) {
      MPI_Recv(..., 0, tag, MPI_COMM_WORLD, &status);
      MPI_Send(..., 0, tag, MPI_COMM_WORLD);
    }
    
  • Make at least one of the MPI_SEND's non-blocking (MPI_ISEND)

    if (rank == 0) {
      MPI_Isend(..., 1, tag, MPI_COMM_WORLD, &req);
      MPI_Recv(..., 1, tag, MPI_COMM_WORLD, &status);
      MPI_Wait(&req, &status);
    } else if (rank == 1) {
      MPI_Recv(..., 0, tag, MPI_COMM_WORLD, &status);
      MPI_Send(..., 0, tag, MPI_COMM_WORLD);
    }
    

[ Top of page | Return to FAQ ]