LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Geoffrey Irving (irving_at_[hidden])
Date: 2005-11-20 15:30:43


Hello,

I'm getting a weird deadlock when trying to create a new communicator. I'm running
6 processes on two quad processor machines (4 on 1 and 2 on the other), and trying to
create a communicator for the first two processes. I sucessfully create a group a
group containing the first two processes (ranks 0 and 1), and then every process calls
MPI_Comm_Create (actually the C++ binding). Processes 1 and 2 successfully complete
the call and proceed to other communication. Processes 0,3,4,5 never return from the
call to MPI_Comm_Create. The deadlock is deterministic, including which processes
return and which don't.

As far as I can tell I'm passing correct arguments to the functions involved.
Unfortunately the set of processes that completes the call doesn't seem to correlate
with anything: the new communicator should contain {0,1}, and processes {0,1,2,3} are
on the same machine, but {1,2} succeed.

The program has executed a bunch of communication before it reaches this point,
including allocating other communicators. I'm running lam 7.1.1.

Thanks,
Geoffrey