Howdy all,
I have been fighting for several days with trying to get LAMMPI to work with
GM 2.0.19 (myrinet) on our G5 (OS X 10.3.7) cluster. I have been through
the archives, found many possible solutions, but still don't have it
working.
I have tried lam 7.1.1 and 7.2.1.b without success. Here is the information
that I have....any and all suggestions welcome!
I configure LAM 7.2.1b using
./configure --prefix=/Users/Shared/LAM7.2b/ --with-memory-manager=none
--without-romio --with-rpi=gm --with-rpi-gm=/common/gm --disable-tv-queue
It seems to configure, make and install properly. I reset my path to use
the /Users/Shared/LAM7.2b directory and "which mpirun" points to the proper
directory.
Running laminfo gives the following
node065:/usr/local jjp8508$ laminfo
LAM/MPI: 7.2b1svn10122
Prefix: /Users/Shared/LAM7.2b
Architecture: powerpc-apple-darwin7.7.0
Configured by: jjp8508
Configured on: Wed Apr 13 09:32:25 CDT 2005
Configure host: cat.tamu.edu
Memory manager: none
C bindings: yes
C++ bindings: yes
Fortran bindings: yes
C compiler: gcc
C++ compiler: g++
Fortran compiler: f77
Fortran symbols: plain
C profiling: yes
C++ profiling: yes
Fortran profiling: yes
C++ exceptions: no
Thread support: yes
ROMIO support: no
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (API v1.1, Module v0.6)
SSI boot: rsh (API v1.1, Module v1.1)
SSI boot: slurm (API v1.1, Module v1.0)
SSI coll: lam_basic (API v1.1, Module v7.1)
SSI coll: shmem (API v1.1, Module v1.0)
SSI coll: smp (API v1.1, Module v1.2)
SSI rpi: crtcp (API v1.1, Module v1.1)
SSI rpi: gm (API v1.1, Module v1.2)
SSI rpi: lamd (API v1.0, Module v7.1)
SSI rpi: sysv (API v1.0, Module v7.1)
SSI rpi: tcp (API v1.0, Module v7.1)
SSI rpi: usysv (API v1.0, Module v7.1)
SSI cr: self (API v1.0, Module v1.0)
I compiled the lamtest-7.1.1 then do a
node065:~/gm jjp8508$ lamboot -v ./machines
LAM 7.2b1svn04132005/MPI 2 C++ - Indiana University
n-1<16740> ssi:boot:base:linear: booting n0 (node065.cluster.private)
n-1<16740> ssi:boot:base:linear: booting n1 (node066.cluster.private)
n-1<16740> ssi:boot:base:linear: booting n2 (node067.cluster.private)
n-1<16740> ssi:boot:base:linear: finished
When I try to run a simple test program from the 7.1.1 test suite using gm,
I get the following
node065:~/gm jjp8508$ mpirun C -ssi rpi gm -ssi rpi_verbose level:1000 ./cpi
n0<16758> ssi:rpi:open: verbosity:1000
n0<16759> ssi:rpi:open: verbosity:1000
n0<16759> ssi:rpi:open: looking for rpi module named gm
n2<26249> ssi:rpi:open: verbosity:1000
n1<18867> ssi:rpi:open: verbosity:1000
n2<26249> ssi:rpi:open: looking for rpi module named gm
n1<18867> ssi:rpi:open: looking for rpi module named gm
n0<16758> ssi:rpi:open: looking for rpi module named gm
----------------------------------------------------------------------------
-
The rpi module named "gm" could not be found.
This typically means that you misspelled the desired module name, used
the wrong name entirely, or the module has decided that it does not
want to run in this environment.
----------------------------------------------------------------------------
-
----------------------------------------------------------------------------
-
The rpi module named "gm" could not be found.
This typically means that you misspelled the desired module name, used
the wrong name entirely, or the module has decided that it does not
want to run in this environment.
----------------------------------------------------------------------------
-
----------------------------------------------------------------------------
-
It seems that [at least] one of the processes that was started with
mpirun did not invoke MPI_INIT before quitting (it is possible that
more than one process did not invoke MPI_INIT -- mpirun was only
notified of the first one, which was on node n0).
mpirun can *only* be used with MPI programs (i.e., programs that
invoke MPI_INIT and MPI_FINALIZE). You can use the "lamexec" program
to run non-MPI programs over the lambooted nodes.
----------------------------------------------------------------------------
-
However, the program runs on the exact same nodes by just changing the gm to
tcp.
node065:~/gm jjp8508$ mpirun C -ssi rpi tcp -ssi rpi_verbose level:1000
../cpi
n0<16792> ssi:rpi:open: verbosity:1000
n0<16792> ssi:rpi:open: looking for rpi module named tcp
n0<16792> ssi:rpi:open: opening rpi module tcp
n0<16792> ssi:rpi:open: opened rpi module tcp
n0<16792> ssi:rpi:query: querying rpi module tcp
n0<16792> ssi:rpi:tcp: module initializing
n0<16792> ssi:rpi:tcp:verbose: 1000
n0<16792> ssi:rpi:tcp:priority: 20
n0<16792> ssi:rpi:query: rpi module available: tcp, priority: 20
__________________________SNIP______________________
Process 0 on node065.cluster.private
Process 1 on node065.cluster.private
Process 2 on node066.cluster.private
Process 3 on node067.cluster.private
pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.000643
n0<16793> ssi:rpi:tcp: module finalizing
n0<16793> ssi:rpi: Closing
n1<18873> ssi:rpi:tcp: module finalizing
n2<26255> ssi:rpi:tcp: module finalizing
n1<18873> ssi:rpi: Closing
n2<26255> ssi:rpi: Closing
n0<16792> ssi:rpi:tcp: module finalizing
n0<16792> ssi:rpi: Closing
Thanks
Jeff
--
Jeff Polasek
Computer Systems Manager
Chemical Engineering Department
Texas A&M University
979-845-3398
|