LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: esaifu (esaifu_at_[hidden])
Date: 2006-06-08 09:52:51


Make sure that you have listed all your nodes including master along with
its cpu count in the "lam-bhost.def " file(This file will be in <Lam
instllation path>/etc/lam-bhost.def).You can also try the HPL.dat file which
i am attaching along with this mail.Please let me know if it works.If swap
is using while running the xhpl just reduce the matrix size from the
HPL.dat file and do the same.
You can give the matrix size up to 57965,then only the system will use the
whole memory.Hence you
will get better performance.
----- Original Message -----
From: "Davide Cittaro" <davide.cittaro_at_[hidden]>
To: <lam_at_[hidden]>
Sent: Thursday, June 08, 2006 5:10 PM
Subject: LAM: xhpl crashes

> Hi there, I'm pretty new to LAM/MPI, so please be patient with me ;-)
> I've installed 10 dual opteron nodes cluster with gentoo linux and
> lam/mpi 7.1.1 connected with Gigabit Ethernet, it works fine (even
> coupled with SGE).
> I would like, now, to test the cluster with linpack, so I've downloaded
> and installed xhpl. It happens that as I increase the N value (the
> problem size value) it crashes. More in details:
> 10 nodes, 2 CPU/node, 4Gb RAM/node, running
>
> $ mpirun -np 20 /usr/bin/xhpl
> ------------------------------------------------------------------------
> -----
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 12824 failed on node n0 (85.239.175.36) due to signal 9.
> ------------------------------------------------------------------------
> -----
>
> looking at the HPL.out flie, it crashes as N=520... I'm confused, as I
> read on their website, I should be able to use values up to 40000,
> according to my cluster configuration.
>
> $ head -n6 HPL.dat
> HPLinpack benchmark input file
> Innovative Computing Laboratory, University of Tennessee
> HPL.out output file name (if any)
> 1 device out (6=stdout,7=stderr,file)
> 4 # of problems sizes (N)
> 511 515 520 525 Ns
>
> Does anybody here has same problems?
>
> Thanks
>
> d
>
> /*
> Davide Cittaro
> Bioinformatics Systems @ Informatics Core
>
> IFOM - Istituto FIRC di Oncologia Molecolare
> via adamello, 16
> 20139 Milano
> Italy
>
> tel.: +39(02)574303355
> e-mail: davide.cittaro_at_[hidden]
> */
>
>
>
>
>



  • application/octet-stream attachment: HPL.dat