On Mar 30, 2006, at 3:00 PM, Olof Mattsson wrote:
> We're two network and systemadminstration students at the
> University of
> Skövde, Sweden and we've built a OSCAR cluster and now we need to
> benchmark it. We want to run HPCC Benchmark and we've compiled it with
> lam and used ATLAS compiled on our system. The problem is it works
> great
> on three nodes and four CPUs. As soon as we change hpccinf.txt to a
> larger grid (PxQ) to use all 15 computing nodes (16 CPUs, 4x4) the
> benchmark won't start. We have also tried with 2x4,1x8,2x3,3x3,3x4 but
> nothing larger then 2x2 och 1x4 works for us.
> The cluster consists of two dual AMD 2400+ with 2 GB RAM and one of
> these is the masternode, the other is a computing node. Four AMD
> 1900+,
> 1 GB RAM and ten AMD 1900+ with 512 MB RAM. The masternode as two
> nic's
> and the private (eth1) is connected to a Summit4 and all nodes are
> connected to that switch with fastethernet. We use OSCAR 4.2 on Fedora
> Core 3
Sorry about the delay in replying - somehow this message slipped
through the cracks the last month :(.
How is the benchmark failing? Is it crashing, or just appearing to
take a long time? There are some parts of the HPCC suite that do not
scale well, especially over TCP, so they can take a long time to run
as node numbers increase, especially if you have sub-optimal tuning
parameters. If you are crashing, a backtrace would be most helpful.
If you are seeing hangs, a back-trace from where you are hanging
would still be useful...
Brian
--
Brian Barrett
LAM/MPI developer and all around nice guy
Have a LAM/MPI day: http://www.lam-mpi.org/
|