LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Geoffrey Irving (irving_at_[hidden])
Date: 2005-11-21 11:23:55


On Mon, Nov 21, 2005 at 08:15:50AM -0500, Jeff Squyres wrote:
> On Nov 20, 2005, at 6:18 PM, Geoffrey Irving wrote:
>
> > Ah, apparently I'm not always running 7.1.1 (I just noticed that).
> > I imagine compiling and linking with one version of lam and then
> > running under a different version is not supported. Still, it
> > would be nice if there was a slightly better error message in that
> > case.
>
> Yes, this can definitely be a problem. Also make sure that you're
> running in 32 bit mode on both sides; LAM does not support both 32 and
> 64 bit processes in the same parallel job (more specifically: it may
> work and it may not -- we make no guarantees and do no error checking
> for this case). Let us know if you are able to run correctly after
> getting the versioning issues worked out.

Unfortunately, I fixed the versioning issues and the problem is unchanged.
Here's my laminfo (which is now the same on both machines):

             LAM/MPI: 7.1.1
              Prefix: /solver/adm/lam
        Architecture: x86_64-unknown-linux-gnu
       Configured by: irving
       Configured on: Sun Nov 20 16:16:55 PST 2005
      Configure host: lie
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: no
          C compiler: /usr/local/compilers/gcc-4.0.1-x86_64-x86_64/bin/gcc
        C++ compiler: /usr/local/compilers/gcc-4.0.1-x86_64-x86_64/bin/g++
    Fortran compiler: false
     Fortran symbols: none
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: no
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI boot: globus (API v1.1, Module v0.6)
            SSI boot: rsh (API v1.1, Module v1.1)
            SSI boot: slurm (API v1.1, Module v1.0)
            SSI coll: lam_basic (API v1.1, Module v7.1)
            SSI coll: shmem (API v1.1, Module v1.0)
            SSI coll: smp (API v1.1, Module v1.2)
             SSI rpi: crtcp (API v1.1, Module v1.1)
             SSI rpi: lamd (API v1.0, Module v7.1)
             SSI rpi: sysv (API v1.0, Module v7.1)
             SSI rpi: tcp (API v1.0, Module v7.1)
             SSI rpi: usysv (API v1.0, Module v7.1)
              SSI cr: self (API v1.0, Module v1.0)

and I can now compile the bug.cpp from the previous email with either of
these:

    mpic++ -o bug bug.cpp
    mpic++ -march=nocona -msse3 -mfpmath=sse -O2 -o bug bug.cpp

If I run it with six processes on the two quad processor machines (0,1,2,3
on one machine and 4,5 on the other), it still produces this output and
hangs:

0: before
5: before
1: before
2: before
3: before
4: before
1: middle
2: middle

I am running in 64 bit mode on both sides (thus the x86_64-x86_64 in the
compiler name). The program works correctly if I run 6 nodes all on the
same machine. Hopefully this is enough to reproduce the problem.

> LAM prints version mismatch messages only in certain situations (it's
> unfortunately a complicated issue). Sorry about that. :-(

No problem. I can add my own version checking code. Thanks for the
great software!

Geoffrey