LAM/MPI logo

Manual page for XMPI 2.2

  |   Home   |   Download   |   Documentation   |   FAQ   |  

SYNTAX

     xmpi [-h] [<boot_schema>]


DESCRIPTION

     XMPI is a graphical user interface for running MPI programs,
     monitoring MPI processes and messages, and viewing execution
     trace files.  It exploits the debugging capabilities of LAM,
     a parallel computing environment for UNIX clusters.  XMPI is
     constructed from the Motif widget set.

     XMPI does not provide an interface for starting a  LAM  ses-
     sion.   This  must  be  accomplished  prior to running XMPI,
     which is itself a LAM program.  The boot schema  from  which
     LAM  was started can (should) be provided to XMPI so that it
     may be presented as an inventory of nodes on which  programs
     may  be  run. If XMPI is to be used only to view trace files
     then starting LAM is not required.

     This description assumes a basic knowledge of MPI.


TYPICAL USAGE

     XMPI provides a  graphical  display  of  the  state  of  the
     processes  within an MPI application.  The state information
     is obtained from one of two sources, a  running  application
     started  by  XMPI  or  a  file  containing trace data from a
     traced MPI application.  When XMPI is started, its top-level
     overview window is blank.  Once an application is started or
     a trace file is loaded the  overview  window  fills  with  a
     tiled  group of hexagons, each representing the state of one
     MPI  process  and  labeled  by  the  process   rank   within
     MPI_COMM_WORLD.   A  traffic  light symbol indicates whether
     the process is running or  blocked.   No  traffic  light  is
     shown  for  processes which have either finalized or not yet
     initialized the MPI library.

     When monitoring a running application the camera "Snap" but-
     ton or "Snapshot" item in the "Application" menu updates the
     state information on all processes at any time.  When  view-
     ing trace data the state information is updated according to
     the currently selected time point (see "XMPI TRACE FILES").

     A mouse click inside a hexagon pops up an additional  window
     containing  more detailed information about the process.  If
     the process is blocked,  the  function  name,  peer  process
     rank,  communicator,  message  tag  and  element  count  are
     displayed.  If  unreceived  messages  are  available,  their
     quantity, source process rank, communicator, message tag and
     element count are displayed.  By leaving a few process  win-
     dows  on  the  screen, a user can focus debugging on a small
     and manageable collection of misbehaving processes.

     The "Clean" button or "Clean" item in the "Application" menu
     terminates  an  application and the development cycle can be
     repeated.  The previous application can be  rerun  with  the
     "Rerun" button or "Rerun" item in the "Application" menu.


RUNNING AN APPLICATION

     An application schema specifies an MPI application by  list-
     ing  each  process's  program name, program location, target
     processor(s) and optional command line arguments.

     The "Browse&Run" item in the "Application" menu  pops  up  a
     simple  file  browser for choosing and running a pre-written
     application schema.  Alternatively an application schema can
     be  configured  with  the  XMPI  application builder dialog,
     invoked by the "Build&Run" item in the "Application" menu.

     The builder dialog has an area to specify each  process  and
     an  arrow  button to add it to the application schema, which
     is shown below the arrow button in  a  scrolled  list.   The
     lines  in  the  list  show  the syntax that would be used in
     creating the same application with a text  editor.   Indeed,
     the "Save" button saves the application schema in a file for
     later use and/or editing.

     A specified process does not become part of the  application
     until the arrow (commit) button is pressed.  Once it appears
     in the application scrolled list, a process can  be  deleted
     by selecting it and pressing the <Delete> key.

     Pressing the "Run" button with anything in  the  application
     list causes that application to be run.  The overview window
     is then initialized with the status of the application.

  Program Specification
     A file browser in the middle of the builder dialog  aids  in
     selecting  a  program  file.  The browser only navigates the
     file space of the  node  running  XMPI.   If  a  program  is
     located on another node outside the file space (outside NFS,
     etc.) its pathname may need to be  typed  into  the  process
     specification  area.  Selecting the "Use Full Pathname" tog-
     gle button will cause programs to be placed into the  appli-
     cation schema as full pathnames.

     XMPI limits the choice of a program source  node  to  either
     the  node  running  XMPI  or  the  process target node.  The
     latter case is the default and is the most efficient because
     LAM  does  not  need  to transfer the program from source to
     target node.  The "Transfer Program" toggle  button  selects
     the source node policy.

  Multiple Program Copies
     The number of copies of a program to be run can  be  set  in
     the  process  specification area.  Clicking on the increment
     or decrement arrow will increment or decrement the count  by
     one.  Clicking  with  the  shift  key down will increment or
     decrement by ten.

  Command-line Arguments
     Command-line  arguments  must  be  typed  into  the  process
     specification area.

  Node Specification
     A boot schema specifies the computers participating as nodes
     in  a  LAM  multicomputer.   If  XMPI is given a boot schema
     filename, its contents will appear in a scrolled list on the
     right  side  of the builder dialog. XMPI will search for the
     given schema  in  the  local  directory.   The  boot  schema
     filename is displayed above the list of its nodes.  Multiple
     target nodes can be selected from the scrolled list with the
     corresponding node mnemonic appearing in the process specif-
     ication area.  Selecting  multiple  target  nodes  specifies
     multiple  processes  with  the  program  name, arguments and
     source node policy held constant.

     If no boot schema was specified only the special node selec-
     tors "LOCAL" (meaning the node on which XMPI is running) and
     "ALL NODES" are provided.

     Target node descriptions may also be typed directly into the
     process  specification area.  The local node is specified as
     h.  The origin node from which the machine  was  booted,  if
     not  local,  can  be  specified  as o.  All usable nodes are
     specified  as  N.   Nodes  are  generically  identified   as
     n<list>,  where  <list> can be a single node identifier or a
     list of node identifiers.  Identifiers  can  be  written  in
     decimal  or  hexadecimal  notation.   Examples are n1 or n0-
     7,0x10.

  Run-time Options
     Applications can be run with  various  run-time  options  to
     specify  the behaviour of the MPI library. These can be con-
     figured from a separate dialog which is activated  from  the
     "Runtime"  item  in  the  "Options" menu.  Options remain in
     effect until changed.

     o     tracing mode (default enabled)

     o     fast client-to-client communication (default disabled)

     o     GER protocol and error detection (default enabled)

     o     homogeneous LAM node optimization (default disabled)


FOCUSING ON A PROCESS

     More information on a process's state  can  be  obtained  by
     clicking  the  left mouse button within the process hexagon.
     This will pop up a focus window.   The  upper  area  of  the
     focus  window  is  the process area and displays the current
     state of the process.  The lower area is  the  message  area
     and displays information on the process's message queue.

     The focus window banner contains a tack button which can  be
     clicked  to  dismiss  the  window and a label containing the
     process's identity along with the  program  name.   In  XMPI
     processes   are   identified   first   by   their   rank  in
     MPI_COMM_WORLD and if the process is communicating,  with  a
     slash followed by the process's rank within the current com-
     municator. The focus window can also be dismissed by  click-
     ing once again in the process hexagon.

     The process area describes the current state of the  process
     together  with the name of and (where appropriate) arguments
     to the MPI function currently being executed.  The layout is
     fairly  self-explanatory and we describe only the less obvi-
     ous features.

  Communicator Identification
     The "comm" area shows the communicator  being  used  in  the
     current  MPI  function.   Communicators  are  opaque objects
     which MPI does not identify  in  any  meaningful,  printable
     way.  LAM's MPI implementation adds a simple numerical iden-
     tifier to communicators, which is displayed in XMPI  as  <x>
     where x is the identifier. This identifier can be matched to
     communicator variables in an MPI program with the LAM  func-
     tion, MPIL_Comm_id(2).

  Group Membership
     The button to the right of the "comm" area will highlight in
     the  overview  window  the  hexagons of the processes in the
     communicator.  For an intracommunicator, the  hexagons  will
     be  highlighted  in  the  color  specified  by the "lcomCol"
     resource.  For an intercommunicator, processes in the  local
     group  will  be  highlighted  in  the color specified by the
     "lcomCol" resource and those in  the  remote  group  in  the
     color  specified by the "rcomCol" resource.  For highlighted
     processes the process identification at the  bottom  of  the
     hexagon is changed to be the rank in MPI_COMM_WORLD followed
     by  a  slash  and  the  rank  in  the   communicator   being
     highlighted.

  Datatype
     The datatype button to the right  of  the  "cnt"  area  will
     display  in  the datatype window (see "DATATYPE WINDOW") the
     type map of the datatype argument to the current  MPI  func-
     tion.
     The message area describes the current state of the queue of
     messages destined to the process and not yet received.  Once
     again the layout is fairly self-explanatory and we  describe
     only the less obvious features.

  Message Aggregates
     Identical undelivered messages are  aggregated.  The  "copy"
     area  shows the number of messages within the visible aggre-
     gate, followed by the total number of messages in the queue.
     The  button  to  the right of the "copy" area cycles through
     the message aggregates.

  Source Rank
     The "src" area shows the rank of the source  process  within
     MPI_COMM_WORLD followed by the rank of the source process in
     the communicator in which the message was sent.

  Datatype
     The datatype button to the right  of  the  "cnt"  area  will
     display in the datatype window the type map of the message's
     datatype.

  Group Membership
     The button to the right of the "comm"  area  will  highlight
     the message communicator in the manner previously described.


XMPI TRACE FILES

     XMPI can be used to view existing trace  files  and  can  be
     used to create trace files for applications run under XMPI.

     To load and view an existing trace file  select  the  "View"
     item in the "Trace" menu.

     If an application is run under  XMPI  with  tracing  enabled
     (the  default),  LAM will trace the application.  Before the
     trace data can be viewed in XMPI it  must  be  dumped  to  a
     file.   This  is  done by selecting the "Dump" item from the
     "Trace" menu.  You will be prompted for  a  file  name.   By
     convention  XMPI  trace  files  have a ".lamtr" suffix.  The
     trace file can be viewed by loading it as  described  above.
     As a shortcut select the "Express" item in the "Trace" menu,
     or equivalently click the "Trace"  button  in  the  overview
     window.   This  dumps the trace data to a temporary file and
     then immediately loads the file for viewing.  If you  decide
     that  you want to save trace data for later viewing then you
     must dump it using the "Dump" item from  the  "Trace"  menu.
     Dumping trace data to file does not purge any trace data and
     a subsequent dump will contain all the trace data  from  the
     start of the application up until the time of dumping.  Ter-
     minating an application via the "Clean" button or menu  item
     purges all trace data.

     While viewing a trace an application previously launched  by
     XMPI  continues  to run in the background.  Upon the closing
     of the trace window XMPI will return  to  snapshot  mode  if
     there is a running application.

     When loading trace files containing multiple  segments  (see
     MPIL_Trace_on(2) and MPIL_Trace_off(2)) you will be prompted
     for the number of the segment you wish to view.  If you wish
     later  to  view a different segment, simply reload the trace
     file and specify  the  new  segment  number  when  prompted.
     Reloading  is  done via the "View" or "Express" items in the
     "Trace" menu.

  Communication Timeline Window
     Across the top of the  timeline  window  is  a  control  and
     information area.  The trace data is displayed below this on
     timelines, one per process in the traced  application.   The
     state of the application at a particular time is represented
     by the corresponding traffic light color.  Green  represents
     running, red represents blocked waiting on communication and
     yellow represents time spent  inside  an  MPI  function  not
     blocked  on communication (we call this system overhead time
     as it typically represents time doing data conversion,  mes-
     sage packing, etc).

     The dial can be used to select a time  point  at  which  the
     process  states are to be displayed.  In the overview window
     the process states at the dial time are displayed in hexagon
     form.   As with snapshot mode more detailed information on a
     process can be obtained by bringing  up  its  focus  window.
     The  dial  may  be moved by clicking with the left button in
     the trace view area or via the VCR controls.  Below the  VCR
     controls  are  displayed from left to right, the time of the
     left edge of the displayed timeline, the current  dial  time
     and the time of the right edge of the displayed timeline.

     To the right of the VCR controls is  displayed  the  current
     magnification.   When a trace file is loaded XMPI chooses an
     initial scaling factor and sets this to be the 1x1  magnifi-
     cation.   You  can  increase  and decrease the magnification
     using the zoom and un-zoom buttons.

     A  segment  of  the  currently  displayed  timeline  can  be
     selected  by dragging the right mouse button in the timeline
     display area.  Upon release of the right button the  display
     is  zoomed to show the selected segment. To cancel a drag in
     progress, drag the cursor up or down  out  of  the  timeline
     display area.

  How Communication Is Represented
     Collective
         A  collective  communication  is  represented  for  each
         process  by  contiguous  line  segments showing the time
         spent in system overhead  and  the  time  spent  blocked
         waiting  for communication.  No lines are drawn connect-
         ing the processes participating in the  collective  com-
         munication.

     Blocking
         For both the send and receive  process  contiguous  line
         segments  are  drawn  showing  the  time spent in system
         overhead and the time spent blocked waiting for the com-
         munication  to complete.  A line is drawn connecting the
         send to the receive.  It originates at the beginning  of
         the  send segments and is drawn to the end of the match-
         ing receive segments.

     Non-blocking
         At the time a non-blocking send or receive is  initiated
         a system overhead segment is drawn.  When the communica-
         tion is completed via a wait or test,  segments  showing
         system  overhead and blocking time are drawn.  Lines are
         drawn between matching sends  and  receives,  except  in
         this  case  the line is drawn from the segment where the
         send was initiated to where  the  corresponding  receive
         completed.

     Waits
         If a non-blocking communication is  completed  inside  a
         wait/test  function  XMPI will show the function name in
         the focus window as the wait/test function  followed  in
         parentheses  by  the  send/receive  function  being com-
         pleted.  For example, if an  MPI_Issend()  is  completed
         inside  an  MPI_Wait(),  the function will read MPI_Wait
         (MPI_Issend).

     Missing
         Owing to the use of trace segments or  the  dropping  of
         overflow  traces  (see lamtrace(1)) there may be send or
         receive traces which have no match in  the  trace  data.
         In  these  cases  a  short stub line is drawn out from a
         send or in to a receive.

  Kiviat Window
     When viewing a trace file,  the "Kiviat" button or  "Kiviat"
     item  from  the  "Trace"  menu  brings up the Kiviat window.
     This window displays, in a segmented pie-chart  format,  the
     cumulative  time  up to the current dial time, spent by each
     process in the running, overhead and blocked states.


MESSAGE SOURCE MATRIX

     The message source window displays a square matrix  of  pro-
     cess  message  queue lengths.  For each process it shows the
     number of queued messages from each  other  process  in  the
     application.   It  can be brought up while monitoring a run-
     ning application or while viewing a trace file, by selecting
     the "Matrix" button or "Matrix" item in the "Trace" menu.


DATATYPE WINDOW

     The datatype window displays a textual representation of the
     type  map  of an MPI datatype.  This window is associated at
     any instant with a particular process and mode.  The associ-
     ated process is shown in the window's banner and the mode is
     indicated by a traffic light or message queue icon shown  in
     the left part of the window.  When in process mode the data-
     type being shown, if any, is the datatype  argument  of  the
     MPI function the process is executing.  When in message mode
     the datatype  is  that  of  the  current  message  aggregate
     selected  in  the  process  focus window.  Switching between
     processes and modes is effected via the datatype buttons  in
     the process focus windows.

     The type map might not fit completely into the default  size
     window.  Simply resize the window to see the whole map.


SWITCHING INFORMATION SOURCES

     XMPI will gather and display  information  from  either  the
     currently  executing  application  or a trace file.  When an
     application is launched from XMPI, the information source is
     the  executing  application and the "Snap" button is active.
     Though the application may  be  producing  trace  data,  the
     "Snap" button does not use it, but instead acquires informa-
     tion from debugging hooks in the MPI implementation.  At any
     moment,  an  existing  trace file may be loaded into XMPI or
     the currently accumulating trace data may  be  fetched  from
     the  MPI implementation, stored in a file, and loaded.  This
     action changes the information source to  the  loaded  trace
     file.   Information  display is now controlled from the dial
     in the timeline window and not from the "Snap" button, which
     is  now  inactive.  Though the application may still be run-
     ning, the timeline dial does not use the  runtime  debugging
     hooks,  but  instead  acquires  information  from the loaded
     trace file.  Upon the closing of the trace window XMPI  will
     return to snapshot mode if there is a running application.


RESOURCES

     XMPI defines the following application resources.

     XMPI.helpCmd        command that is run to provide help. The
                         default  is  typically  a  command which
                         fires up a Web browser to  view  a  help
                         page.   You should change this to invoke
                         your favourite browser.

     XMPI.rankFont       process rank font in hexagon

     XMPI.msgFont        total message count font in hexagon (may
                         need  to  be adjusted to fit inside mes-
                         sage icon)

     XMPI.lcomCol        color used to highlight the processes in
                         an intracommunicator or in the the local
                         group of an intercommunicator

     XMPI.rcomCol        color used to highlight the processes in
                         the remote group of an intercommunicator

     XMPI.bandCol        color used for the zoom selection rubber
                         band

     XMPI.bandDash       if True use a dashed line rubber band to
                         show  the zoom selection otherwise use a
                         solid line

     XMPI.bandWidth      width of the zoom selection rubber band

     XMPI gets important default resources from  the  application
     defaults  file,  XMPI.  If this file is not installed in the
     X11 default directory, its directory can  be  added  to  the
     XAPPLRESDIR environment variable.


LIMITATIONS

     An application must be started by XMPI to  be  monitored  by
     it.

     When using the fast client-to-client communication mode pro-
     cess states in snapshot mode are always shown as running and
     no useful information is shown in the process focus windows.

     XMPI uses lamclean(1).  Errors reported by this  tools  will
     still  print  to  standard  output.   A shorter message will
     appear in an XMPI error dialog.


SEE ALSO

     mpimsg(1), mpirun(1), mpitask(1), lamtrace(1)