On Sun, 25 Apr 2004, Irshad Ahmed wrote:
> How can we get trace of MPI calls in a parallel program, which also has
> detached threads, the threads also executes MPI calls. The programm is
> working absolutely ok, and is tested much many times,and now it is the
> analysis phase.(POSIX threads used)
Just remember the disclaimer from the previous mails on this -- LAM does
*not* support multiple threads within MPI simultaneously. You can have a
multi-threaded program, but only one thread can be in MPI at any given
time. See the FAQ and the User's Guide for more information on threads.
> I tried to use XMPI but at the end of execution of the XMPI shows only
> the last thread which executed the MPI call ( "sndng-rcvng-DIE.lamtr"
> trace file is attached).the file shows only end portion of the program.
The lam traces are a round-robin buffer which will periodically overwrite
itself.
> I think that XMPI flushes out the trace or cannot show the multiple
> threads
The former is true, but I'm not sure what the latter means -- from LAM's
point of view, it doesn't know or care about threads. It just knows that
your application has invoked MPI calls. All of them will be in the
traces, not just the ones from a single thread.
> 1- ANY IDEA ABOUT WHAT SHOULD I DO.
It depends on what you're trying to do. Are you trying to get a
per-thread timeline of MPI calls?
(see below)
> 2- STEPS, HOW A PARALLEL PROGRAM SHOULD BE ANALYZED ? i mean which
> things should get proper attention and how?
Again, it depends on what you're trying to do.
If XMPI is not sufficient for what you're doing (and it sounds like it
isn't), you may want to investigate other profiling tools and/or write
your own. Before you shrudder at the thought -- it may not be as hard as
you think.
MPI was designed with the concept of link-time intercepting MPI calls in
mind. Hence, you can intercept a call to MPI_Send, record the fact that
it was called (and in your case, from which thread it was called), and
then call the real/underlying MPI_Send. This is known as MPI's profiling
layer. In this case, you'll provide your own MPI_Send function (and all
other MPI functions that you want to intercept) and do something like
this (this is typed off the top of my head -- pardon typos):
-----
int MPI_Init(int *argc, char **argv) {
/* ... open trace output files ... */
/* Call the real/underlying MPI_Init, named PMPI_Init */
return PMPI_Init(argc, &argv);
}
int MPI_Send(...list all args here...) {
int ret;
int tid = pthread_self();
/* ...write out the fact that you called MPI_Send with this
thread ID... */
/* Call the read/underling MPI_Send */
ret = MPI_Send(...);
/* ...write out the fact that MPI_Send returned with this
thread ID... */
return ret;
}
int MPI_Finalize(void) {
/* ... close trace output files ... */
return PMPI_Finalize();
}
-----
Something along those lines may be useful.
> 3- Which technique is better for executing parallel program, either the
> executable file of the parallel program on each node, or it is
> appropriate to have only one executable file at the node which initiats
> the mpirun command. REASON, why?
Short version: either is fine.
Longer version: I can parse your question multiple ways, so I'm not
entirely sure what you are asking.
1. Run the same executable from every node vs. run different executables.
2. Have one executable on the node with mpirun, and use the "-s" switch to
send it to all the nodes to execute.
Any of these situations is fine. Using "-s" is a bit slow, but it works
fine and has no impact on the run-time performance of your application.
LAM allows quite a flexibly system of launching applications so that
developers can do whatever is most convenient for them. So whether you
have to rdist/rsync/NFS your application out to all nodes, or whether you
have different executables, or whether you use -s, it makes no difference
once you his MPI_INIT. My advice for getting your application out to all
the nodes is to use whatever method is a) most convenient and b) fastest.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|