LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Michael Creel (michael.creel_at_[hidden])
Date: 2007-02-23 07:59:08


I'm experiencing a problem with LAM-MPI. This occurs with both the 7.1.1 package
in Debian unstable as well as with 7.1.3 self compiled on Kubuntu Edgy 64 bit
version. The compile switches I use are
./configure --enable-shared --disable-static --with-modules --with-trillium

Then I'm using MPITB for GNU Octave, compiled against either of the two versions
of LAM/MPI. I run a script which performs a task using an increasingly large
data set, using from 1 to 4 nodes (the cluster is made of 2 machines, each of
which has 2 Xeon 64 bit processors running at 3.6 GHz). The output I get is as
follows, with the error at the end.

########################################################################
kernel regression example with several sample sizes serial/parallel timings
  4000 data points and 1 compute nodes: 2.359059
  4000 data points and 2 compute nodes: 1.804235
  4000 data points and 3 compute nodes: 1.578466
  4000 data points and 4 compute nodes: 1.815341
  8000 data points and 1 compute nodes: 8.486310
  8000 data points and 2 compute nodes: 4.810935
  8000 data points and 3 compute nodes: 3.553705
  8000 data points and 4 compute nodes: 3.292904
  10000 data points and 1 compute nodes: 12.804898
  10000 data points and 2 compute nodes: 7.040623
  10000 data points and 3 compute nodes: 5.084657
  10000 data points and 4 compute nodes: 4.369931
  12000 data points and 1 compute nodes: 18.254475
  12000 data points and 2 compute nodes: 10.608161
  12000 data points and 3 compute nodes: 6.967206
  12000 data points and 4 compute nodes: 5.812930
  16000 data points and 1 compute nodes: 34.901709
  16000 data points and 2 compute nodes: 18.662503
  16000 data points and 3 compute nodes: 13.253614
  16000 data points and 4 compute nodes: 10.133724
  20000 data points and 1 compute nodes: 60.665225
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Intercomm_merge()
Rank (0, MPI_COMM_WORLD): - MPI_Comm_spawn()
Rank (0, MPI_COMM_WORLD): - main()
MPI_Intercomm_merge: internal MPI error: out of descriptors (rank 0, comm 82)
MPI_Intercomm_merge: internal MPI error: out of descriptors (rank 0,
MPI_COMM_PARENT)

michael_at_parallelknoppix1:~/Octave/Econometrics/Parallel/kernel$

So the script works ok up to a point. This is reproducible - it always happens
when the problem gets large enough. Any ideas what the problem might be? Thanks,
Michael