On Mon, Nov 21, 2005 at 08:15:50AM -0500, Jeff Squyres wrote:
> On Nov 20, 2005, at 6:18 PM, Geoffrey Irving wrote:
>
> > Ah, apparently I'm not always running 7.1.1 (I just noticed that).
> > I imagine compiling and linking with one version of lam and then
> > running under a different version is not supported. Still, it
> > would be nice if there was a slightly better error message in that
> > case.
>
> Yes, this can definitely be a problem. Also make sure that you're
> running in 32 bit mode on both sides; LAM does not support both 32 and
> 64 bit processes in the same parallel job (more specifically: it may
> work and it may not -- we make no guarantees and do no error checking
> for this case). Let us know if you are able to run correctly after
> getting the versioning issues worked out.
Unfortunately, I fixed the versioning issues and the problem is unchanged.
Here's my laminfo (which is now the same on both machines):
LAM/MPI: 7.1.1
Prefix: /solver/adm/lam
Architecture: x86_64-unknown-linux-gnu
Configured by: irving
Configured on: Sun Nov 20 16:16:55 PST 2005
Configure host: lie
Memory manager: ptmalloc2
C bindings: yes
C++ bindings: yes
Fortran bindings: no
C compiler: /usr/local/compilers/gcc-4.0.1-x86_64-x86_64/bin/gcc
C++ compiler: /usr/local/compilers/gcc-4.0.1-x86_64-x86_64/bin/g++
Fortran compiler: false
Fortran symbols: none
C profiling: yes
C++ profiling: yes
Fortran profiling: no
C++ exceptions: no
Thread support: yes
ROMIO support: yes
IMPI support: no
Debug support: no
Purify clean: no
SSI boot: globus (API v1.1, Module v0.6)
SSI boot: rsh (API v1.1, Module v1.1)
SSI boot: slurm (API v1.1, Module v1.0)
SSI coll: lam_basic (API v1.1, Module v7.1)
SSI coll: shmem (API v1.1, Module v1.0)
SSI coll: smp (API v1.1, Module v1.2)
SSI rpi: crtcp (API v1.1, Module v1.1)
SSI rpi: lamd (API v1.0, Module v7.1)
SSI rpi: sysv (API v1.0, Module v7.1)
SSI rpi: tcp (API v1.0, Module v7.1)
SSI rpi: usysv (API v1.0, Module v7.1)
SSI cr: self (API v1.0, Module v1.0)
and I can now compile the bug.cpp from the previous email with either of
these:
mpic++ -o bug bug.cpp
mpic++ -march=nocona -msse3 -mfpmath=sse -O2 -o bug bug.cpp
If I run it with six processes on the two quad processor machines (0,1,2,3
on one machine and 4,5 on the other), it still produces this output and
hangs:
0: before
5: before
1: before
2: before
3: before
4: before
1: middle
2: middle
I am running in 64 bit mode on both sides (thus the x86_64-x86_64 in the
compiler name). The program works correctly if I run 6 nodes all on the
same machine. Hopefully this is enough to reproduce the problem.
> LAM prints version mismatch messages only in certain situations (it's
> unfortunately a complicated issue). Sorry about that. :-(
No problem. I can add my own version checking code. Thanks for the
great software!
Geoffrey
|