LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-02-02 07:05:56


On Feb 1, 2006, at 7:13 PM, Brian Wainscott wrote:

> We are running lam-6.5.9 and our application is getting this error
> message (seems to also happen on lam-7.x):

My first thought is "please upgrade if possible!" :-), but if this is
also happening with 7.1.1, then this might actually be a problem in LAM.

> MPI_Comm_dup: internal MPI error: out of descriptors (rank 0, comm
> 4087)
>
>> From looking at the source code I can see there is a limit of
>> about 4096
> or so communicators. The thing is, we have checked carefully and we
> only have about 20 or so communicators at any given time -- they
> regularly get created and freed.
>
> So my question is this: is it possible that, even though we call
> MPI_COMM_FREE, the communicator is not freed? I suspect an
> unwaited for
> ISEND or IRECV somewhere, that is causing a communicator to be kept
> internally after we free it. We are checking on this now, but I
> wonder
> if there is something else that might be going on?

A communicator should not be holding a file descriptor open; the way
the data structures are setup, communicators are not the entities
that "own" network resources (i.e., communicators have links to the
underlying reference-counted data structures that "own" network
resources such as file descriptors).

Can you describe the situation a little more?

- Does this always happen at the same point in your code? I.e., is
it reproducible in a regular fashion?

- If you're on an operating system with /proc, can you look during a
run and see what all the fd's are being used for?

- Can you attach a debugger in see exactly where in MPI_COMM_DUP this
error is occurring? There are several places in share/mpi/cdup.c
where MPI_ERR_INTERN could be returned; knowing which one it is might
be helpful in tracking down the cause.

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/