I am currently working with a 6 computer homogenous cluster running TAO
Linux. I have successfully installed and run LAM and many of the
included examples.
I am attempting to perform a benchmark using HPL.
I downloaded and made HPL with the following make file:
#######################################
############# Make.cluster ############
#######################################
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL = /bin/sh
#
CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH = cluster
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir = /mnt/beowulf/hpl
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
#
HPLlib = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - MPI directories - library ------------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the C compiler where to find the Message Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir = /usr/local
MPinc = -I$(MPdir)/include
MPlib = /usr/local/lib/liblamf77mpi.a
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the C compiler where to find the Linear Algebra library
# header files, LAlib is defined to be the name of the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir = /mnt/beowulf/ATLAS
LAinc = -I$(LAdir)/include
LAlib = $(LAdir)/lib/Brite05/libcblas.a
$(LAdir)/lib/Brite05/libatlas.a
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77 interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_ : all lower case and a suffixed underscore (Suns,
# Intel, ...), [default]
# -DNoChange : all lower case (IBM RS6000),
# -DUpCase : all upper case (Cray),
# -DAdd__ : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle : The string address is passed at the string loca-
# tion on the stack, and the string length is then
# passed as an F77_INTEGER after all explicit
# stack arguments, [default]
# -DStringStructPtr : The address of a structure is passed by a
# Fortran 77 string, and the structure is of the
# form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal : A structure is passed by value for each Fortran
# 77 string, and the structure is of the form:
# struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle : Special option for Cray machines, which uses
# Cray fcd (fortran character descriptor) for
# interoperation.
#
F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS call the cblas interface;
# -DHPL_CALL_VSIPL call the vsip library;
# -DHPL_DETAILED_TIMING enable detailed timers;
#
# By default HPL will:
# *) not copy L before broadcast,
# *) call the Fortran 77 BLAS interface
# *) not display detailed timing information.
#
HPL_OPTS = -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC = /usr/bin/mpicc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
#
LINKER = /usr/bin/mpicc
LINKFLAGS =
#
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
#
# ----------------------------------------------------------------------
#######################################
#######################################
This successfully ran and created an executable in the bin/cluster
directory...
I then issued the command lamboot -v /mnt/beowulf/lamhosts which gave
the output:
#######################################
########### LAMBOOT OUTPUT ############
#######################################
[beowulf_at_Wulf00 hpl]$ lamboot -v /mnt/beowulf/lamhosts
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
n-1<3812> ssi:boot:base:linear: booting n0 (Wulf00)
n-1<3812> ssi:boot:base:linear: booting n1 (Wulf01)
n-1<3812> ssi:boot:base:linear: booting n2 (Wulf02)
n-1<3812> ssi:boot:base:linear: booting n3 (Wulf03)
n-1<3812> ssi:boot:base:linear: booting n4 (Wulf04)
n-1<3812> ssi:boot:base:linear: booting n5 (Wulf05)
n-1<3812> ssi:boot:base:linear: finished
#######################################
#######################################
I attempted to run the executable on all 6 nodes in the cluster using:
[beowulf_at_Wulf00 hpl]$ mpirun n0-5 /mnt/beowulf/hpl/bin/cluster/xhpl
which produced the output:
#######################################
############ MPIRUN OUTPUT ############
#######################################
It seems that there is no lamd running on this host, which indicates
that the LAM/MPI runtime environment is not operating. The LAM/MPI
runtime environment is necessary for MPI programs to run (the MPI
program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
-----------------------------------------------------------------------------
It seems that there is no lamd running on this host, which indicates
that the LAM/MPI runtime environment is not operating. The LAM/MPI
runtime environment is necessary for MPI programs to run (the MPI
program tired to invoke the "MPI_Init" function).
Please run the "lamboot" command the start the LAM/MPI runtime
environment. See the LAM/MPI documentation for how to invoke
"lamboot" across multiple machines.
#######################################
#######################################
A combination of these messages were repeated, but I did not want to
paste in the redundant messages and just waste space.
I then searched the mailing list archive for hpl and found this post:
http://www.lam-mpi.org/MailArchives/lam/2003/10/6943.php
I went back in and made a copy of my Make.cluster and renamed it to
Make.second. I modified the mpInc and MPlib variables like the post
suggested:
#######################################
############## Make.second ############
#######################################
# ----------------------------------------------------------------------
# - shell --------------------------------------------------------------
# ----------------------------------------------------------------------
#
SHELL = /bin/sh
#
CD = cd
CP = cp
LN_S = ln -s
MKDIR = mkdir
RM = /bin/rm -f
TOUCH = touch
#
# ----------------------------------------------------------------------
# - Platform identifier ------------------------------------------------
# ----------------------------------------------------------------------
#
ARCH = second
#
# ----------------------------------------------------------------------
# - HPL Directory Structure / HPL library ------------------------------
# ----------------------------------------------------------------------
#
TOPdir = /mnt/beowulf/hpl
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
#
HPLlib = $(LIBdir)/libhpl.a
#
# ----------------------------------------------------------------------
# - MPI directories - library ------------------------------------------
# ----------------------------------------------------------------------
# MPinc tells the C compiler where to find the Message Passing library
# header files, MPlib is defined to be the name of the library to be
# used. The variable MPdir is only used for defining MPinc and MPlib.
#
MPdir = /usr/local
MPinc =
MPlib =
#
# ----------------------------------------------------------------------
# - Linear Algebra library (BLAS or VSIPL) -----------------------------
# ----------------------------------------------------------------------
# LAinc tells the C compiler where to find the Linear Algebra library
# header files, LAlib is defined to be the name of the library to be
# used. The variable LAdir is only used for defining LAinc and LAlib.
#
LAdir = /mnt/beowulf/ATLAS
LAinc = -I$(LAdir)/include
LAlib = $(LAdir)/lib/Brite05/libcblas.a
$(LAdir)/lib/Brite05/libatlas.a
#
# ----------------------------------------------------------------------
# - F77 / C interface --------------------------------------------------
# ----------------------------------------------------------------------
# You can skip this section if and only if you are not planning to use
# a BLAS library featuring a Fortran 77 interface. Otherwise, it is
# necessary to fill out the F2CDEFS variable with the appropriate
# options. **One and only one** option should be chosen in **each** of
# the 3 following categories:
#
# 1) name space (How C calls a Fortran 77 routine)
#
# -DAdd_ : all lower case and a suffixed underscore (Suns,
# Intel, ...), [default]
# -DNoChange : all lower case (IBM RS6000),
# -DUpCase : all upper case (Cray),
# -DAdd__ : the FORTRAN compiler in use is f2c.
#
# 2) C and Fortran 77 integer mapping
#
# -DF77_INTEGER=int : Fortran 77 INTEGER is a C int, [default]
# -DF77_INTEGER=long : Fortran 77 INTEGER is a C long,
# -DF77_INTEGER=short : Fortran 77 INTEGER is a C short.
#
# 3) Fortran 77 string handling
#
# -DStringSunStyle : The string address is passed at the string loca-
# tion on the stack, and the string length is then
# passed as an F77_INTEGER after all explicit
# stack arguments, [default]
# -DStringStructPtr : The address of a structure is passed by a
# Fortran 77 string, and the structure is of the
# form: struct {char *cp; F77_INTEGER len;},
# -DStringStructVal : A structure is passed by value for each Fortran
# 77 string, and the structure is of the form:
# struct {char *cp; F77_INTEGER len;},
# -DStringCrayStyle : Special option for Cray machines, which uses
# Cray fcd (fortran character descriptor) for
# interoperation.
#
F2CDEFS = -DAdd_ -DF77_INTEGER=int -DStringSunStyle
#
# ----------------------------------------------------------------------
# - HPL includes / libraries / specifics -------------------------------
# ----------------------------------------------------------------------
#
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
#
# - Compile time options -----------------------------------------------
#
# -DHPL_COPY_L force the copy of the panel L before bcast;
# -DHPL_CALL_CBLAS call the cblas interface;
# -DHPL_CALL_VSIPL call the vsip library;
# -DHPL_DETAILED_TIMING enable detailed timers;
#
# By default HPL will:
# *) not copy L before broadcast,
# *) call the Fortran 77 BLAS interface
# *) not display detailed timing information.
#
HPL_OPTS = -DHPL_CALL_CBLAS
#
# ----------------------------------------------------------------------
#
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
#
# ----------------------------------------------------------------------
# - Compilers / linkers - Optimization flags ---------------------------
# ----------------------------------------------------------------------
#
CC = /usr/bin/mpicc
CCNOOPT = $(HPL_DEFS)
CCFLAGS = $(HPL_DEFS) -fomit-frame-pointer -O3 -funroll-loops -W -Wall
#
LINKER = /usr/bin/mpicc
LINKFLAGS =
#
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
#
# ----------------------------------------------------------------------
#######################################
#######################################
I then tried running the new executable:
[beowulf_at_Wulf00 hpl]$ mpirun n0-5 /mnt/beowulf/hpl/bin/second/xhpl
and got the same output.
I have attempted many different combinations of nodes and every time get
the same result.
I halted the lam for verification that it was running:
[beowulf_at_Wulf00 hpl]$ lamhalt -v
LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University
Shutting down LAM
hreq: waiting for HALT ACKs from remote LAM daemons
hreq: received HALT_ACK from n1 (Wulf01)
hreq: received HALT_ACK from n2 (Wulf02)
hreq: received HALT_ACK from n3 (Wulf03)
hreq: received HALT_ACK from n4 (Wulf04)
hreq: received HALT_ACK from n5 (Wulf05)
hreq: received HALT_ACK from n0 (Wulf00)
LAM halted
I have searched google and the mailing list archives for a solution and
have been unable to get anything to work.
I hope someone will be able to offer a suggestion, thank you for your time!
-Jesse Harvey
|