LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Javier Fernández (javier_at_[hidden])
Date: 2007-02-23 14:09:31


Michael Creel wrote:

> Then I'm using MPITB for GNU Octave, compiled against either of the
> two versions of LAM/MPI. I run a script which performs a task using an
> increasingly large data set, using from 1 to 4 nodes

> ########################################################################
> kernel regression example with several sample sizes serial/parallel
> timings
> 4000 data points and 1 compute nodes: 2.359059
> 4000 data points and 2 compute nodes: 1.804235
> 4000 data points and 3 compute nodes: 1.578466
> 4000 data points and 4 compute nodes: 1.815341

Perhaps you could also check the speedup just with 2 processes in
different nodes... just to tell if it is the algorithm which does not
scale too well, or the biprocessors, who contend for memory access

> 8000 data points and 1 compute nodes: 8.486310
> 8000 data points and 2 compute nodes: 4.810935

Not twice as faster, but if you told they were biprocessors LAM will be
scheduling both processes to the first node

> 16000 data points and 1 compute nodes: 34.901709
> 16000 data points and 2 compute nodes: 18.662503

Well. almost twice.

> 16000 data points and 3 compute nodes: 13.253614
> 16000 data points and 4 compute nodes: 10.133724
> 20000 data points and 1 compute nodes: 60.665225
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Intercomm_merge()
> Rank (0, MPI_COMM_WORLD): - MPI_Comm_spawn()
> Rank (0, MPI_COMM_WORLD): - main()
> MPI_Intercomm_merge: internal MPI error: out of descriptors (rank 0,
> comm 82)
> MPI_Intercomm_merge: internal MPI error: out of descriptors (rank 0,
> MPI_COMM_PARENT)

I fear my ignorance will show up here... you really obtained _that_ call
stack? MPI_Intercomm_merge called from within MPI_Comm_spawn?

> This is reproducible - it always happens when the problem gets large
> enough. Any ideas what the problem might be? Thanks, Michael
>
comm 82 is a rather high number. Have you already tried to free those
merged communicators after use? I'm not sure MPI_Finalize can
automagically free them... in fact I'm not sure you MPI_Finalize between
epochs :-)

Hope that makes the trick

-javier