Hi there, I'm pretty new to LAM/MPI, so please be patient with me ;-)
I've installed 10 dual opteron nodes cluster with gentoo linux and
lam/mpi 7.1.1 connected with Gigabit Ethernet, it works fine (even
coupled with SGE).
I would like, now, to test the cluster with linpack, so I've
downloaded and installed xhpl. It happens that as I increase the N
value (the problem size value) it crashes. More in details:
10 nodes, 2 CPU/node, 4Gb RAM/node, running
$ mpirun -np 20 /usr/bin/xhpl
------------------------------------------------------------------------
-----
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 12824 failed on node n0 (85.239.175.36) due to signal 9.
------------------------------------------------------------------------
-----
looking at the HPL.out flie, it crashes as N=520... I'm confused, as
I read on their website, I should be able to use values up to 40000,
according to my cluster configuration.
$ head -n6 HPL.dat
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
1 device out (6=stdout,7=stderr,file)
4 # of problems sizes (N)
511 515 520 525 Ns
Does anybody here has same problems?
Thanks
d
/*
Davide Cittaro
Bioinformatics Systems @ Informatics Core
IFOM - Istituto FIRC di Oncologia Molecolare
via adamello, 16
20139 Milano
Italy
tel.: +39(02)574303355
e-mail: davide.cittaro_at_[hidden]
*/
|