On Feb 2, 2006, at 10:41 AM, Angel Tsankov wrote:
> The last MPI function I call before MPI_Finalize is MPI_Allreduce. As
> far as I can see, it has been implemented in terms of other MPI
> funcions.
> Having this in mind, is it possible that one of the MPI processes
> exits MPI_Allreduce before all the other processes and calls
> MPI_Finalize before they have finished their call to MPI_Allreduce? If
> so, then later on one of the other processes could detect that one of
> the processes is missing. Is it possible that LAM fails with SIGILL in
> this situation?
My e-mail really gets away from me sometimes; I apologize it's taken
so long to reply. :-(
If you have an Allreduce as the last thing before Finalize, it should
execute properly before the process exits. All of LAM's allreduce
implementations are synchronizing in nature, meaning that no one will
leave it before everyone has entered it (they're implemented in a
reduce-followed-by-broadcast model). So while it's impossible to
guarantee that everyone enters Finalize at exactly the same time,
your Allreduce should have the same effect as a barrier (i.e., they
should all finish Allreduce and enter Finalize at more-or-less the
same time).
>> Can you run this application through a memory-checking debugger? If
>> you have access to an x86-based machine, you can use the valgrind
>> memory-checking debugger.
>
> I have access to a single x86 workstation only. Does it make sense to
> run multiple MPI processes on a single-cpu machine with Valgrind?
Absolutely. It'll run a bit slow because you're heavily overloading
the CPU -- so you may only want to run only 2-4 processes -- but it's
something that I do all the time.
> The cluster where LAM 7.1.1 is installed and where the MPI program
> fails as explained in my original post consists of G4 PowerPCs. Can I
> run Valgrind on this cluster? The manual mentions that Valgrind can
> run on PPC.
I honestly don't know; if Valgrind says that it can run on PPC, then
give it a whirl.
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|