LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-07-16 17:26:28


Yes, this is definitely a problem. LAM never made any claims about
binary compatibility between versions, which is one of the reasons
that we put this version check in place.

Sorry it was so confusing; glad you finally got it figured out!

On Jul 16, 2007, at 5:22 PM, trymelz trymelz wrote:

> Hi, Jeff,
>
> Thanks. I guess we have the problem on 3b. I have 3 machines
>
> Machine A: lam 7.0.6, build executable A, copy it to Machine B
> Machine B: lam 7.1.3, get executable A from Machine A
> Machine C: lam 7.1.3, build executable C
>
> Run executable A on Machine B with executable C on Machine C. I
> will update the lam version on Machine A if possible and let you
> know the results
>
> Thanks.
>
>
>
> Jeff Squyres <jsquyres_at_[hidden]> wrote: Is this problem is still
> occurring, then you must still somehow have
> remnants of different versions of LAM somewhere. Here's what I would
> do...
>
> 1. Uninstall every copy of LAM from your machines. Make them be 100%
> LAM-free.
> 2. Re-install *only* 7.1.3 on both machines.
> (I think you've done 1-2 already, but I wanted to mention this to
> be complete)
> 3. Recompile your application with the new LAM installation.
> a. If your app is available via a network filesystem to all
> nodes, you're done
> b. If your app must be distributed to all nodes, either build it
> on every node or manually distribute it to all nodes
> 4. Run it with the new LAM installation
>
> You should be good. If you're still getting the mismatch message,
> let us know.
>
>
> On Jul 12, 2007, at 11:03 AM, trymelz trymelz wrote:
>
> > Laminfo outputs the lamd version 7.1.0 on both machines (both
> > under interactive and non-interactive).
> >
> > I installed lam on both machines from
> >
> > http://www.lam-mpi.org/download/files/lam-7.1.3.tar.gz
> >
> > Jeff Squyres wrote: What is the output from
> > laminfo on both machines? It should show the
> > version of the lamd RPI.
> >
> > How are you installing on both machines, from a source 7.1.3
> tarball,
> > or from some other kind of package?
> >
> >
> > On Jul 11, 2007, at 10:30 AM, trymelz trymelz wrote:
> >
> > > Hi, Jeff,
> > >
> > > Do you know how to check the version of the lamd RPI? It shows
> > > version 7.1.0 by laminfo (both interactive and non-interactive). I
> > > had an old version of lam installed, but I removed all of them.
> > > Then I tried to uninstall/configure/make/install the newest
> version
> > > again. But the same problem is still there.
> > >
> > > I believe that the RPI is using some libraries coming with lam, so
> > > I am wondering if it is possible to check these libraries to see
> > > their version. Thanks
> > >
> > > Linfa
> > >
> > > Jeff Squyres wrote: The error message is
> > > telling you that you have different versions of
> > > the lamd RPI (not the lamd executable) on your different machines.
> > > So I think you want to check what versions of LAM you have
> installed
> > > on each machine. If all else fails, you might want to just
> > > uninstall / reinstall LAM on both machines to guarantee that you
> > have
> > > the same versions.
> > >
> > >
> > > On Jul 9, 2007, at 12:06 PM, trymelz trymelz wrote:
> > >
> > > > Jeff,
> > > >
> > > > Thanks for your information. but...
> > > >
> > > > [Machine_A] rsh Machine_B 'which lamd'
> > > > /usr/bin/lamd
> > > >
> > > > [Machine_B] which lamd
> > > > /usr/bin/lamd
> > > >
> > > > where Machine_A is rank 0 and rank 1, and Machine_B is rank 2
> > > >
> > > > Linfa
> > > >
> > > > Jeff Squyres wrote: It looks like you have a
> > > > version mismatch of LAM/MPI between your two
> > > > nodes. The error message is telling you that it found two
> > different
> > > > versions of the lamd RPI on two nodes:
> > > >
> > > > MPI_COMM_WORLD rank 0: lamd (v7.1.0)
> > > > MPI_COMM_WORLD rank 2: lamd (v7.0.0)
> > > >
> > > > Your laminfo is showing that you have 7.1.3 installed on both
> > nodes,
> > > > but you might want to check for PATH differences on non-
> > interactive
> > > > logins.
> > > >
> > > >
> > > > On Jul 6, 2007, at 4:40 PM, trymelz trymelz wrote:
> > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > Anyone has an idea about the "mismatched in their RPI
> > selections"
> > > > > problem? Thanks.
> > > > >
> > > > > 1.lamboot -v hostfile3
> > > > >
> > > > > LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
> > > > >
> > > > > n-1<14838> ssi:boot:base:linear: booting n0 (64-bit Linux
> > > machine_A)
> > > > > n-1<14838> ssi:boot:base:linear: booting n1 (32-bit Linux
> > > machine_B)
> > > > > n-1<14838> ssi:boot:base:linear: finished
> > > > >
> > > > > 2. mpirun -ssi rpi lamd program
> > > > >
> > > > >
> > > >
> > >
> >
> ----------------------------------------------------------------------
> > > > > -------
> > > > > It seems that [at least] one of the processes that was started
> > > with
> > > > > mpirun chose a different RPI than its peers. For example, at
> > least
> > > > > the following two processes mismatched in their RPI
> selections:
> > > > >
> > > > > MPI_COMM_WORLD rank 0: lamd (v7.1.0)
> > > > > MPI_COMM_WORLD rank 2: lamd (v7.0.0)
> > > > >
> > > > > All MPI processes must choose the same RPI module and version
> > when
> > > > > they start. Check your SSI settings and/or the local
> environment
> > > > > variables on each node.
> > > > >
> > > >
> > >
> >
> ----------------------------------------------------------------------
> > > > > -------
> > > > >
> > > > > 3. [Machina A]$ rsh Machine_B laminfo
> > > > > LAM/MPI: 7.1.3
> > > > > Prefix: /usr
> > > > > Architecture: i686-pc-linux-gnu
> > > > > Configured by: linfa
> > > > > Configured on: Fri Jul 6 13:12:05 CDT 2007
> > > > > Configure host: Machine_B
> > > > > Memory manager: ptmalloc2
> > > > > C bindings: yes
> > > > > C++ bindings: yes
> > > > > Fortran bindings: yes
> > > > > C compiler: gcc
> > > > > C++ compiler: g++
> > > > > Fortran compiler: g77
> > > > > Fortran symbols: double_underscore
> > > > > C profiling: yes
> > > > > C++ profiling: yes
> > > > > Fortran profiling: yes
> > > > > C++ exceptions: no
> > > > > Thread support: yes
> > > > > ROMIO support: yes
> > > > > IMPI support: no
> > > > > Debug support: no
> > > > > Purify clean: no
> > > > > SSI boot: globus (API v1.1, Module v0.6)
> > > > > SSI boot: rsh (API v1.1, Module v1.1)
> > > > > SSI boot: slurm (API v1.1, Module v1.0)
> > > > > SSI coll: lam_basic (API v1.1, Module v7.1)
> > > > > SSI coll: shmem (API v1.1, Module v1.0)
> > > > > SSI coll: smp (API v1.1, Module v1.2)
> > > > > SSI rpi: crtcp (API v1.1, Module v1.1)
> > > > > SSI rpi: lamd (API v1.0, Module v7.1)
> > > > > SSI rpi: sysv (API v1.0, Module v7.1)
> > > > > SSI rpi: tcp (API v1.0, Module v7.1)
> > > > > SSI rpi: usysv (API v1.0, Module v7.1)
> > > > > SSI cr: self (API v1.0, Module v1.0)
> > > > >
> > > > > 4. [Machina A]$ laminfo
> > > > > LAM/MPI: 7.1.3
> > > > > Prefix: /usr/local
> > > > > Architecture: x86_64-unknown-linux-gnu
> > > > > Configured by: linfa
> > > > > Configured on: Tue Jun 26 16:07:16 CDT 2007
> > > > > Configure host: Machine_A
> > > > > Memory manager: ptmalloc2
> > > > > C bindings: yes
> > > > > C++ bindings: yes
> > > > > Fortran bindings: yes
> > > > > C compiler: /opt/intel/cce/9.0/bin/icc
> > > > > C++ compiler: /opt/intel/cce/9.0/bin/icpc
> > > > > Fortran compiler: /opt/intel/fce/9.0/bin/ifort
> > > > > Fortran symbols: underscore
> > > > > C profiling: yes
> > > > > C++ profiling: yes
> > > > > Fortran profiling: yes
> > > > > C++ exceptions: no
> > > > > Thread support: yes
> > > > > ROMIO support: yes
> > > > > IMPI support: no
> > > > > Debug support: no
> > > > > Purify clean: no
> > > > > SSI boot: globus (API v1.1, Module v0.6)
> > > > > SSI boot: rsh (API v1.1, Module v1.1)
> > > > > SSI boot: slurm (API v1.1, Module v1.0)
> > > > > SSI coll: lam_basic (API v1.1, Module v7.1)
> > > > > SSI coll: shmem (API v1.1, Module v1.0)
> > > > > SSI coll: smp (API v1.1, Module v1.2)
> > > > > SSI rpi: crtcp (API v1.1, Module v1.1)
> > > > > SSI rpi: lamd (API v1.0, Module v7.1)
> > > > > SSI rpi: sysv (API v1.0, Module v7.1)
> > > > > SSI rpi: tcp (API v1.0, Module v7.1)
> > > > > SSI rpi: usysv (API v1.0, Module v7.1)
> > > > > SSI cr: self (API v1.0, Module v1.0)
> > > > >
> > > > >
> > > > > The fish are biting.
> > > > > Get more visitors on your site using Yahoo! Search Marketing.
> > > > > _______________________________________________
> > > > > This list is archived at http://www.lam-mpi.org/MailArchives/
> > lam/
> > > >
> > > >
> > > > --
> > > > Jeff Squyres
> > > > Cisco Systems
> > > >
> > > > _______________________________________________
> > > > This list is archived at http://www.lam-mpi.org/MailArchives/
> lam/
> > > >
> > > >
> > > > Pinpoint customers who are looking for what you sell.
> > > > _______________________________________________
> > > > This list is archived at http://www.lam-mpi.org/MailArchives/
> lam/
> > >
> > >
> > > --
> > > Jeff Squyres
> > > Cisco Systems
> > >
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> > >
> > > Yahoo! oneSearch: Finally, mobile search that gives answers, not
> > > web links.
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> > Building a website is a piece of cake.
> > Yahoo! Small Business gives you all the tools to get online.
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> Looking for a deal? Find great prices on flights and hotels with
> Yahoo! FareChase.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Cisco Systems