LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Javier Fernandez Baldomero (javier_at_[hidden])
Date: 2004-06-04 16:53:08


Hi all,

Sorry for this being a little off-topic question, but I'm not sure
I could find people knowledgeable/interested on this somewhere else.
Sorry for the inconveniences, and thanks in advance for any help / idea.

I am using LAM 7.0.2 / 7.0.4 under Linux kernels 2.4.20 / 2.6.6.
Which tools can be used to monitor memory contention in a biprocessor?
Is there any /proc file that reveals anything related to memory access
speed?

I am running an ambarrassingly parallel LAM application that gives
nearly linear speedup in a small cluster (8 Pentium II 333MHz
128MB RAM 512KB cache).

Now I have moved to a biprocessor cluster (8 Athlon MP 2400+ 2GHz
2GB RAM 256KB cache) and I noticed that the first dual CPU used
does not reduce runtime. Other dual CPUs may improve runtime, but
far from linearly. If I use only 1 CPU from each biprocessor, I get the
usual linear speedup graph. I could provide graphs, data, even
the software involved (it's all free) if anybody is interested enough.
I would be glad of providing detailed instructions to reproduce the
problem in a biprocessor cluster, although the next idea is better, I think.

Trying to isolate the problem to minimum size to ask for help,
I've been able to take LAM out of the problem. It has to do with
the application itself - octave (a MATLAB-like program). Particularly,
with the matrix-language operation. A slower version of my test program
that only uses for-loops (instead of matrix-language) does not exhibit
any problem.
When 1 octave process computes half the workload, it spends half the time
(speedup 2), but when 2 octaves compute half the workload on each CPU, they
only reach speedup 1.6 (if using matrix-language -- with for loops
they reach speedup 2 indeed).I posted the question to the octave mailing
list,
but nobody replied.

http://www.octave.org/mailing-lists/help-octave/2004/1326

In that e-mail I included the small .m file used to reproduce
the problem in a biprocessor (just in case anybody knows octave and
is interested in reproducing it), but I would be most grateful
with just some idea or advice on how could I really prove that
it's a memory contention problem (so neither LAM nor my test
program nor octave for-loops are to be blamed for it :-).

If I can document (perhaps with some /proc file field, or some Linux
monitoring tool, or anything) that the for-loop version shows no
memory contention (hence the 2x speedup) and the matrix-language
version does show it, I'm happily done. Is there any tool to do that?

Sorry for the lengthy, possibly off-topic question, and thanks for any
advice

-javier