I'm running the linpack benchmark over 16 machines. Test residual are big
when I run linpack over other 16 machines it works. I don't know why it works
over this group of machines and not work over the first group of machines
The test output is:
============================================================================
HPLinpack 1.0 -- High-Performance Linpack benchmark -- September 27, 2000
Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
============================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 30000
NB : 180
P : 4
Q : 4
PFACT : Crout
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : BlongM
DEPTH : 0
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
----------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
1) ||Ax-b||_oo / ( eps * ||A||_1 * N )
2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )
3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
============================================================================
T/V N NB P Q Time Gflops
----------------------------------------------------------------------------
W05C2C4 30000 180 4 4 926.10 1.944e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 60806833.2204566 ...... FAILED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 91855846.7188683 ...... FAILED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 18090934.5147445 ...... FAILED
||Ax-b||_oo . . . . . . . . . . . . . . . . . = 1.539482
||A||_oo . . . . . . . . . . . . . . . . . . . = 7597.038667
||A||_1 . . . . . . . . . . . . . . . . . . . = 7601.349361
||x||_oo . . . . . . . . . . . . . . . . . . . = 3.363084
||x||_1 . . . . . . . . . . . . . . . . . . . = 19859.432598
============================================================================
When the linpack works the output is:
============================================================================
============================================================================
HPLinpack 1.0 -- High-Performance Linpack benchmark -- September 27, 2000
Written by A. Petitet and R. Clint Whaley, Innovative Computing Labs., UTK
============================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 27500
NB : 180
P : 4
Q : 4
PFACT : Crout
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : BlongM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
----------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual checks will be computed:
1) ||Ax-b||_oo / ( eps * ||A||_1 * N )
2) ||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 )
3) ||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
============================================================================
T/V N NB P Q Time Gflops
----------------------------------------------------------------------------
W15C2C4 27500 180 4 4 764.35 1.814e+01
----------------------------------------------------------------------------
||Ax-b||_oo / ( eps * ||A||_1 * N ) = 0.1817086 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_1 * ||x||_1 ) = 0.0127607 ...... PASSED
||Ax-b||_oo / ( eps * ||A||_oo * ||x||_oo ) = 0.0025336 ...... PASSED
============================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
----------------------------------------------------------------------------
End of Tests.
============================================================================
I run tests with mpirun:
mpirun n1-16 -np 16 xhpl
I try look for errors in ram modules with memtest and errors in net whit
netperf. I didn't see anything abnormal
Please, if somebody knows why linpack don't work submit your emails to:
jcarlos_at_[hidden]
|