Hi and thanks...
On Jun 8, 2006, at 3:52 PM, esaifu wrote:
> Make sure that you have listed all your nodes including master
> along with its cpu count in the "lam-bhost.def " file
I have.
> (This file will be in <Lam instllation path>/etc/lam-bhost.def).You
> can also try the HPL.dat file which i am attaching along with this
> mail.
this is the output using your HPL.dat:
$ mpirun -np 20 /usr/bin/xhpl
========================================================================
====
HPLinpack 1.0a -- High-Performance Linpack benchmark -- January
20, 2004
Written by A. Petitet and R. Clint Whaley, Innovative Computing
Labs., UTK
========================================================================
====
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 57965
NB : 200
PMAP : Row-major process mapping
P : 4
Q : 5
PFACT : Left Crout Right
NBMIN : 8
NDIV : 2
RFACT : Right
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 200)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
------------------------------------------------------------------------
----
- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
1) ||Ax-b||_oo / ( eps * ||A||_1 * N )
2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )
3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be
1.110223e-16
- Computational tests pass if scaled residuals are less
than 16.0
------------------------------------------------------------------------
-----
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.
PID 27606 failed on node n2 (85.239.175.38) due to signal 9.
------------------------------------------------------------------------
-----
I tried xhpl on another little heterogeneous lam/mpi cluster (7.0.6)
and it works. Is 7.1.1 that different? I've read elsewhere that one
can have problems in allocating memory for MPI processes (well, it
was with MPICH, P4_GLOBMEMSIZE variable), but is this configurable
under LAM/MPI?
> Please let me know if it works.If swap is using while running the
> xhpl just reduce the matrix size from the HPL.dat file and do the
> same.
> You can give the matrix size up to 57965,then only the system will
> use the whole memory.
I would like to do this... but I have a 515 liimt :-(
d
> Hence you
> will get better performance.
> ----- Original Message ----- From: "Davide Cittaro"
> <davide.cittaro_at_[hidden]>
> To: <lam_at_[hidden]>
> Sent: Thursday, June 08, 2006 5:10 PM
> Subject: LAM: xhpl crashes
>
>
>> Hi there, I'm pretty new to LAM/MPI, so please be patient with me ;-)
>> I've installed 10 dual opteron nodes cluster with gentoo linux and
>> lam/mpi 7.1.1 connected with Gigabit Ethernet, it works fine (even
>> coupled with SGE).
>> I would like, now, to test the cluster with linpack, so I've
>> downloaded and installed xhpl. It happens that as I increase the
>> N value (the problem size value) it crashes. More in details:
>> 10 nodes, 2 CPU/node, 4Gb RAM/node, running
>>
>> $ mpirun -np 20 /usr/bin/xhpl
>> ---------------------------------------------------------------------
>> --- -----
>> One of the processes started by mpirun has exited with a nonzero exit
>> code. This typically indicates that the process finished in error.
>> If your process did not finish in error, be sure to include a "return
>> 0" or "exit(0)" in your C code before exiting the application.
>>
>> PID 12824 failed on node n0 (85.239.175.36) due to signal 9.
>> ---------------------------------------------------------------------
>> --- -----
>>
>> looking at the HPL.out flie, it crashes as N=520... I'm confused,
>> as I read on their website, I should be able to use values up to
>> 40000, according to my cluster configuration.
>>
>> $ head -n6 HPL.dat
>> HPLinpack benchmark input file
>> Innovative Computing Laboratory, University of Tennessee
>> HPL.out output file name (if any)
>> 1 device out (6=stdout,7=stderr,file)
>> 4 # of problems sizes (N)
>> 511 515 520 525 Ns
>>
>> Does anybody here has same problems?
>>
>> Thanks
>>
>> d
>>
>> /*
>> Davide Cittaro
>> Bioinformatics Systems @ Informatics Core
>>
>> IFOM - Istituto FIRC di Oncologia Molecolare
>> via adamello, 16
>> 20139 Milano
>> Italy
>>
>> tel.: +39(02)574303355
>> e-mail: davide.cittaro_at_[hidden]
>> */
>>
>>
>>
>>
>> <HPL.dat>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
/*
Davide Cittaro
Bioinformatics Systems @ Informatics Core
IFOM - Istituto FIRC di Oncologia Molecolare
via adamello, 16
20139 Milano
Italy
tel.: +39(02)574303355
e-mail: davide.cittaro_at_[hidden]
*/
|