LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Áõ¹ó±ó (goodluck_1982_at_[hidden])
Date: 2006-11-04 09:40:48


Dear all:

I want to use lam test suite to test my LAM installation, but encounterd
some problems.
I install LAM 7.1.2 successfully.
My machine has 8 duo-core Opteron 875 2.2G CPUs, and OS is FC5,
2.6.15-1.2054_FC5 #1 SMP
x86-64.
The laminfo is at the bottom.
In lamtests-7.1.2 directory , after make, I did
make -k check 2>&1 | tee check.out
and a lot of errors occured like this:
================================an error==============================
One of the processes started by mpirun has exited with a nonzero exit
code. This typically indicates that the process finished in error.
If your process did not finish in error, be sure to include a "return
0" or "exit(0)" in your C code before exiting the application.

PID 21815 failed on node n0 (127.0.0.1) with exit status 1.
-----------------------------------------------------------------------------
ERROR: mpirun/allgather_inter/-ssi rpi sysv returned nonzero status
mpirun -x TEST -ssi cr none -s h C -ssi rpi tcp
/root/compiler/lamtests-7.1.2/ccl/intercomm/./allgather_inter
[**ERROR**]: LAM/MPI MPI_COMM_WORLD rank 0, file allgather_inter.c:40:
This test requires an even number of processes to run. Aborting.
=============================an error=================================
Then I know I have to " Boot up a LAM with an even number of CPUs (e.g.,
do a successful |lamboot| with at least one node that has two CPUs, or
at least 2 nodes)."
But I cannot boot up lam on two CPUs.
My machine is a server , 16 CPUs, memory shared.
I write a hostfile which contains two lines of localhost and run
"lamboot -v hostfile", the output is :

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
n-1<28314> ssi:boot:base:linear: booting n0 (localhost)
n-1<28314> ssi:boot:base:linear: finished

lam was booted on only one CPU. Hence, when I use "mpirun C myprogram"
to run, only one cpu is used. But when I use "mpirun -np 4 myprogram" ,
4 CPUs will be used.
But the compile command lines in lamtests are most the form of "mpirun C
xxx" and I cannot change it, so error appeared.
By the way, I config rsh correctly. When I run " rsh localhost ", I can
login correctly.
rsh localhost
connect to address 127.0.0.1 port 543: Connection refused
Trying krb4 rlogin...
connect to address 127.0.0.1 port 543: Connection refused
trying normal rlogin (/usr/bin/rlogin)
Last login: Sat Nov 4 05:13:25 from n0

How can I boot up a lam on two CPUs and how can I run lamtests correctly?
Who can help me? Thanks in advance!

Yours sincerely
Gui-Bin Liu

laminfo:
LAM/MPI: 7.1.2
Prefix: /usr/local/lam-pgi
Architecture: x86_64-unknown-linux-gnu
Configured by: root
Configured on: Sat Nov 4 01:29:15 CST 2006
Configure host: localhost.localdomain
Memory manager: ptmalloc2
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C compiler: pgcc
C++ compiler: pgCC
Fortran compiler: pgf90
Fortran symbols: underscore
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
C++ exceptions: no
Thread support: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (API v1.1, Module v0.6)
SSI boot: rsh (API v1.1, Module v1.1)
SSI boot: slurm (API v1.1, Module v1.0)
SSI coll: lam_basic (API v1.1, Module v7.1)
SSI coll: shmem (API v1.1, Module v1.0)
SSI coll: smp (API v1.1, Module v1.2)
SSI rpi: crtcp (API v1.1, Module v1.1)
SSI rpi: lamd (API v1.0, Module v7.1)
SSI rpi: sysv (API v1.0, Module v7.1)
SSI rpi: tcp (API v1.0, Module v7.1)
SSI rpi: usysv (API v1.0, Module v7.1)
SSI cr: self (API v1.0, Module v1.0)