On Mar 10, 2005, at 5:12 AM, Bob Felderman wrote:
> The race problem I reported earlier this week (using usysv) appears to
> be
> related to the implementation of
>
> share/ssi/coll/smp/src/ssi_coll_smp_allreduce.c
>
> Between tests, the Pallas benchmarks execute Barrier(), then
> set up a new communicator. This leads to all processes calling
> MPI_Comm_split which is implemented using MPI_Allreduce.
After some off the list discussion and some local testing, it appears
there is a race condition that can lead to deadlock when using the SMP
collective algorithms. We think we have it nailed down and will have a
fix in 7.1.2.
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have an LAM/MPI day: http://www.lam-mpi.org/
|