LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Brian Barrett (brbarret_at_[hidden])
Date: 2005-03-10 18:34:12


On Mar 10, 2005, at 5:12 AM, Bob Felderman wrote:

> The race problem I reported earlier this week (using usysv) appears to
> be
> related to the implementation of
>
> share/ssi/coll/smp/src/ssi_coll_smp_allreduce.c
>
> Between tests, the Pallas benchmarks execute Barrier(), then
> set up a new communicator. This leads to all processes calling
> MPI_Comm_split which is implemented using MPI_Allreduce.

After some off the list discussion and some local testing, it appears
there is a race condition that can lead to deadlock when using the SMP
collective algorithms. We think we have it nailed down and will have a
fix in 7.1.2.

Brian

-- 
   Brian Barrett
   LAM/MPI developer and all around nice guy
   Have an LAM/MPI day: http://www.lam-mpi.org/