LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Áõ¹ó±ó (goodluck_1982_at_[hidden])
Date: 2006-12-21 22:42:24


Tim Prince Wrote:
Áõ¹ó±ó wrote:
  
Dear all:

I installed lam-7.1.2 on our cluster. Each node of our cluster has a 
Pentium D 945 3.4G duo-core CPU (em64t).
I installed lam to a NFS shared directory.

=============some command results=================
gbliu@ClusterServer:/cluster/soft/MPI/lamtests-7.1.2> lamboot -v hf

LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University

n-1<10074> ssi:boot:base:linear: booting n0 (ClusterServer)
n-1<10074> ssi:boot:base:linear: booting n1 (n23)
n-1<10074> ssi:boot:base:linear: booting n2 (n24)
n-1<10074> ssi:boot:base:linear: finished
gbliu@ClusterServer:/cluster/soft/MPI/lamtests-7.1.2> cat hf
ClusterServer cpu=2
n23
n23
n24
n24
gbliu@ClusterServer:/cluster/soft/MPI/lamtests-7.1.2> lamnodes
n0      ClusterServer.cluster.t02:2:origin,this_node
n1      n23.cluster.t02:2:
n2      n24.cluster.t02:2:
====================================================

It seems that lamboot was done correctly.
But when I use lamtests-7.1.2, problems occure.
Under the top dir of lamtests-7.1.2, configure and make goes successfully.
And then I do "make -k check", it hangs up at the first test and stops 
there.
the output is as follow:
-----------------------output----------------------
gbliu@ClusterServer:/cluster/soft/MPI/lamtests-7.1.2> make -k check
Making check in reporting
make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
make[1]: Nothing to be done for `check'.
make[1]: Leaving directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
Making check in ccl
make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/ccl'
Making check in intercomm
make[2]: Entering directory 
`/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
make  check-TESTS
make[3]: Entering directory 
`/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
mpirun -x TEST -ssi cr none -s h C -ssi rpi crtcp 
/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm/./allgather_inter
MPI_Comm_accept: unclassified: Bad address (rank 0, comm 4)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD):  - MPI_Comm_accept()
Rank (0, MPI_COMM_WORLD):  - main()
                                         
---------------------------------------------------
After a long time, the output is still like this and the usage of CPU is 0.
I use ctrl-C to cancel it and then do command "lamnodes", but this time
lamnodes also hangs up, no output appears. Only after I do lamboot again,
lamnodes becomes all right.
  I don't know what's the problem. Can someone help me?

                                  Yours sincerely
                                  Guibin Liu


====================================================
laminfo
            LAM/MPI: 7.1.2
             Prefix: /cluster/lammpi-7.1.2
       Architecture: x86_64-unknown-linux-gnu
    

As you have built your lam for x86-64 (64-bit architecture), you must
make sure you don't mix it with an incompatible lam version, or with
objects or libraries built for 32-bit architecture.  Such mixtures would
produce the sort of hang you mention.
  
  I am sure that before I install lam-7.1.2, there is no LAM installed on my system.
  But how do I check if I don't mix it with objects or libraries built for 32-bit architecture.
  I just configire , make and make install, what else should I config?


_______________________________________________ This list is archived at http://www.lam-mpi.org/MailArchives/lam/