I'm experiencing a problem with LAM-MPI. This occurs with both the 7.1.1 package
in Debian unstable as well as with 7.1.3 self compiled on Kubuntu Edgy 64 bit
version. The compile switches I use are
./configure --enable-shared --disable-static --with-modules --with-trillium
Then I'm using MPITB for GNU Octave, compiled against either of the two versions
of LAM/MPI. I run a script which performs a task using an increasingly large
data set, using from 1 to 4 nodes (the cluster is made of 2 machines, each of
which has 2 Xeon 64 bit processors running at 3.6 GHz). The output I get is as
follows, with the error at the end.
########################################################################
kernel regression example with several sample sizes serial/parallel timings
4000 data points and 1 compute nodes: 2.359059
4000 data points and 2 compute nodes: 1.804235
4000 data points and 3 compute nodes: 1.578466
4000 data points and 4 compute nodes: 1.815341
8000 data points and 1 compute nodes: 8.486310
8000 data points and 2 compute nodes: 4.810935
8000 data points and 3 compute nodes: 3.553705
8000 data points and 4 compute nodes: 3.292904
10000 data points and 1 compute nodes: 12.804898
10000 data points and 2 compute nodes: 7.040623
10000 data points and 3 compute nodes: 5.084657
10000 data points and 4 compute nodes: 4.369931
12000 data points and 1 compute nodes: 18.254475
12000 data points and 2 compute nodes: 10.608161
12000 data points and 3 compute nodes: 6.967206
12000 data points and 4 compute nodes: 5.812930
16000 data points and 1 compute nodes: 34.901709
16000 data points and 2 compute nodes: 18.662503
16000 data points and 3 compute nodes: 13.253614
16000 data points and 4 compute nodes: 10.133724
20000 data points and 1 compute nodes: 60.665225
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Intercomm_merge()
Rank (0, MPI_COMM_WORLD): - MPI_Comm_spawn()
Rank (0, MPI_COMM_WORLD): - main()
MPI_Intercomm_merge: internal MPI error: out of descriptors (rank 0, comm 82)
MPI_Intercomm_merge: internal MPI error: out of descriptors (rank 0,
MPI_COMM_PARENT)
michael_at_parallelknoppix1:~/Octave/Econometrics/Parallel/kernel$
So the script works ok up to a point. This is reproducible - it always happens
when the problem gets large enough. Any ideas what the problem might be? Thanks,
Michael
|