LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Tim Prince (timothyprince_at_[hidden])
Date: 2006-12-21 16:44:22


Áõ¹ó±ó wrote:
> Dear all:
>
> I installed lam-7.1.2 on our cluster. Each node of our cluster has a
> Pentium D 945 3.4G duo-core CPU (em64t).
> I installed lam to a NFS shared directory.
>
> =============some command results=================
> gbliu_at_ClusterServer:/cluster/soft/MPI/lamtests-7.1.2> lamboot -v hf
>
> LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
>
> n-1<10074> ssi:boot:base:linear: booting n0 (ClusterServer)
> n-1<10074> ssi:boot:base:linear: booting n1 (n23)
> n-1<10074> ssi:boot:base:linear: booting n2 (n24)
> n-1<10074> ssi:boot:base:linear: finished
> gbliu_at_ClusterServer:/cluster/soft/MPI/lamtests-7.1.2> cat hf
> ClusterServer cpu=2
> n23
> n23
> n24
> n24
> gbliu_at_ClusterServer:/cluster/soft/MPI/lamtests-7.1.2> lamnodes
> n0 ClusterServer.cluster.t02:2:origin,this_node
> n1 n23.cluster.t02:2:
> n2 n24.cluster.t02:2:
> ====================================================
>
> It seems that lamboot was done correctly.
> But when I use lamtests-7.1.2, problems occure.
> Under the top dir of lamtests-7.1.2, configure and make goes successfully.
> And then I do "make -k check", it hangs up at the first test and stops
> there.
> the output is as follow:
> -----------------------output----------------------
> gbliu_at_ClusterServer:/cluster/soft/MPI/lamtests-7.1.2> make -k check
> Making check in reporting
> make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
> make[1]: Nothing to be done for `check'.
> make[1]: Leaving directory `/cluster/soft/MPI/lamtests-7.1.2/reporting'
> Making check in ccl
> make[1]: Entering directory `/cluster/soft/MPI/lamtests-7.1.2/ccl'
> Making check in intercomm
> make[2]: Entering directory
> `/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
> make check-TESTS
> make[3]: Entering directory
> `/cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm'
> mpirun -x TEST -ssi cr none -s h C -ssi rpi crtcp
> /cluster/soft/MPI/lamtests-7.1.2/ccl/intercomm/./allgather_inter
> MPI_Comm_accept: unclassified: Bad address (rank 0, comm 4)
> Rank (0, MPI_COMM_WORLD): Call stack within LAM:
> Rank (0, MPI_COMM_WORLD): - MPI_Comm_accept()
> Rank (0, MPI_COMM_WORLD): - main()
>
> ---------------------------------------------------
> After a long time, the output is still like this and the usage of CPU is 0.
> I use ctrl-C to cancel it and then do command "lamnodes", but this time
> lamnodes also hangs up, no output appears. Only after I do lamboot again,
> lamnodes becomes all right.
> I don't know what's the problem. Can someone help me?
>
> Yours sincerely
> Guibin Liu
>
>
> ====================================================
> laminfo
> LAM/MPI: 7.1.2
> Prefix: /cluster/lammpi-7.1.2
> Architecture: x86_64-unknown-linux-gnu

As you have built your lam for x86-64 (64-bit architecture), you must
make sure you don't mix it with an incompatible lam version, or with
objects or libraries built for 32-bit architecture. Such mixtures would
produce the sort of hang you mention.