Hi, Jeff,
Thanks. I guess we have the problem on 3b. I have 3 machines
Machine A: lam 7.0.6, build executable A, copy it to Machine B
Machine B: lam 7.1.3, get executable A from Machine A
Machine C: lam 7.1.3, build executable C
Run executable A on Machine B with executable C on Machine C. I will update the lam version on Machine A if possible and let you know the results
Thanks.
Jeff Squyres <jsquyres_at_[hidden]> wrote: Is this problem is still occurring, then you must still somehow have
remnants of different versions of LAM somewhere. Here's what I would
do...
1. Uninstall every copy of LAM from your machines. Make them be 100%
LAM-free.
2. Re-install *only* 7.1.3 on both machines.
(I think you've done 1-2 already, but I wanted to mention this to
be complete)
3. Recompile your application with the new LAM installation.
a. If your app is available via a network filesystem to all
nodes, you're done
b. If your app must be distributed to all nodes, either build it
on every node or manually distribute it to all nodes
4. Run it with the new LAM installation
You should be good. If you're still getting the mismatch message,
let us know.
On Jul 12, 2007, at 11:03 AM, trymelz trymelz wrote:
> Laminfo outputs the lamd version 7.1.0 on both machines (both
> under interactive and non-interactive).
>
> I installed lam on both machines from
>
> http://www.lam-mpi.org/download/files/lam-7.1.3.tar.gz
>
> Jeff Squyres wrote: What is the output from
> laminfo on both machines? It should show the
> version of the lamd RPI.
>
> How are you installing on both machines, from a source 7.1.3 tarball,
> or from some other kind of package?
>
>
> On Jul 11, 2007, at 10:30 AM, trymelz trymelz wrote:
>
> > Hi, Jeff,
> >
> > Do you know how to check the version of the lamd RPI? It shows
> > version 7.1.0 by laminfo (both interactive and non-interactive). I
> > had an old version of lam installed, but I removed all of them.
> > Then I tried to uninstall/configure/make/install the newest version
> > again. But the same problem is still there.
> >
> > I believe that the RPI is using some libraries coming with lam, so
> > I am wondering if it is possible to check these libraries to see
> > their version. Thanks
> >
> > Linfa
> >
> > Jeff Squyres wrote: The error message is
> > telling you that you have different versions of
> > the lamd RPI (not the lamd executable) on your different machines.
> > So I think you want to check what versions of LAM you have installed
> > on each machine. If all else fails, you might want to just
> > uninstall / reinstall LAM on both machines to guarantee that you
> have
> > the same versions.
> >
> >
> > On Jul 9, 2007, at 12:06 PM, trymelz trymelz wrote:
> >
> > > Jeff,
> > >
> > > Thanks for your information. but...
> > >
> > > [Machine_A] rsh Machine_B 'which lamd'
> > > /usr/bin/lamd
> > >
> > > [Machine_B] which lamd
> > > /usr/bin/lamd
> > >
> > > where Machine_A is rank 0 and rank 1, and Machine_B is rank 2
> > >
> > > Linfa
> > >
> > > Jeff Squyres wrote: It looks like you have a
> > > version mismatch of LAM/MPI between your two
> > > nodes. The error message is telling you that it found two
> different
> > > versions of the lamd RPI on two nodes:
> > >
> > > MPI_COMM_WORLD rank 0: lamd (v7.1.0)
> > > MPI_COMM_WORLD rank 2: lamd (v7.0.0)
> > >
> > > Your laminfo is showing that you have 7.1.3 installed on both
> nodes,
> > > but you might want to check for PATH differences on non-
> interactive
> > > logins.
> > >
> > >
> > > On Jul 6, 2007, at 4:40 PM, trymelz trymelz wrote:
> > >
> > > >
> > > > Hi,
> > > >
> > > > Anyone has an idea about the "mismatched in their RPI
> selections"
> > > > problem? Thanks.
> > > >
> > > > 1.lamboot -v hostfile3
> > > >
> > > > LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
> > > >
> > > > n-1<14838> ssi:boot:base:linear: booting n0 (64-bit Linux
> > machine_A)
> > > > n-1<14838> ssi:boot:base:linear: booting n1 (32-bit Linux
> > machine_B)
> > > > n-1<14838> ssi:boot:base:linear: finished
> > > >
> > > > 2. mpirun -ssi rpi lamd program
> > > >
> > > >
> > >
> >
> ----------------------------------------------------------------------
> > > > -------
> > > > It seems that [at least] one of the processes that was started
> > with
> > > > mpirun chose a different RPI than its peers. For example, at
> least
> > > > the following two processes mismatched in their RPI selections:
> > > >
> > > > MPI_COMM_WORLD rank 0: lamd (v7.1.0)
> > > > MPI_COMM_WORLD rank 2: lamd (v7.0.0)
> > > >
> > > > All MPI processes must choose the same RPI module and version
> when
> > > > they start. Check your SSI settings and/or the local environment
> > > > variables on each node.
> > > >
> > >
> >
> ----------------------------------------------------------------------
> > > > -------
> > > >
> > > > 3. [Machina A]$ rsh Machine_B laminfo
> > > > LAM/MPI: 7.1.3
> > > > Prefix: /usr
> > > > Architecture: i686-pc-linux-gnu
> > > > Configured by: linfa
> > > > Configured on: Fri Jul 6 13:12:05 CDT 2007
> > > > Configure host: Machine_B
> > > > Memory manager: ptmalloc2
> > > > C bindings: yes
> > > > C++ bindings: yes
> > > > Fortran bindings: yes
> > > > C compiler: gcc
> > > > C++ compiler: g++
> > > > Fortran compiler: g77
> > > > Fortran symbols: double_underscore
> > > > C profiling: yes
> > > > C++ profiling: yes
> > > > Fortran profiling: yes
> > > > C++ exceptions: no
> > > > Thread support: yes
> > > > ROMIO support: yes
> > > > IMPI support: no
> > > > Debug support: no
> > > > Purify clean: no
> > > > SSI boot: globus (API v1.1, Module v0.6)
> > > > SSI boot: rsh (API v1.1, Module v1.1)
> > > > SSI boot: slurm (API v1.1, Module v1.0)
> > > > SSI coll: lam_basic (API v1.1, Module v7.1)
> > > > SSI coll: shmem (API v1.1, Module v1.0)
> > > > SSI coll: smp (API v1.1, Module v1.2)
> > > > SSI rpi: crtcp (API v1.1, Module v1.1)
> > > > SSI rpi: lamd (API v1.0, Module v7.1)
> > > > SSI rpi: sysv (API v1.0, Module v7.1)
> > > > SSI rpi: tcp (API v1.0, Module v7.1)
> > > > SSI rpi: usysv (API v1.0, Module v7.1)
> > > > SSI cr: self (API v1.0, Module v1.0)
> > > >
> > > > 4. [Machina A]$ laminfo
> > > > LAM/MPI: 7.1.3
> > > > Prefix: /usr/local
> > > > Architecture: x86_64-unknown-linux-gnu
> > > > Configured by: linfa
> > > > Configured on: Tue Jun 26 16:07:16 CDT 2007
> > > > Configure host: Machine_A
> > > > Memory manager: ptmalloc2
> > > > C bindings: yes
> > > > C++ bindings: yes
> > > > Fortran bindings: yes
> > > > C compiler: /opt/intel/cce/9.0/bin/icc
> > > > C++ compiler: /opt/intel/cce/9.0/bin/icpc
> > > > Fortran compiler: /opt/intel/fce/9.0/bin/ifort
> > > > Fortran symbols: underscore
> > > > C profiling: yes
> > > > C++ profiling: yes
> > > > Fortran profiling: yes
> > > > C++ exceptions: no
> > > > Thread support: yes
> > > > ROMIO support: yes
> > > > IMPI support: no
> > > > Debug support: no
> > > > Purify clean: no
> > > > SSI boot: globus (API v1.1, Module v0.6)
> > > > SSI boot: rsh (API v1.1, Module v1.1)
> > > > SSI boot: slurm (API v1.1, Module v1.0)
> > > > SSI coll: lam_basic (API v1.1, Module v7.1)
> > > > SSI coll: shmem (API v1.1, Module v1.0)
> > > > SSI coll: smp (API v1.1, Module v1.2)
> > > > SSI rpi: crtcp (API v1.1, Module v1.1)
> > > > SSI rpi: lamd (API v1.0, Module v7.1)
> > > > SSI rpi: sysv (API v1.0, Module v7.1)
> > > > SSI rpi: tcp (API v1.0, Module v7.1)
> > > > SSI rpi: usysv (API v1.0, Module v7.1)
> > > > SSI cr: self (API v1.0, Module v1.0)
> > > >
> > > >
> > > > The fish are biting.
> > > > Get more visitors on your site using Yahoo! Search Marketing.
> > > > _______________________________________________
> > > > This list is archived at http://www.lam-mpi.org/MailArchives/
> lam/
> > >
> > >
> > > --
> > > Jeff Squyres
> > > Cisco Systems
> > >
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> > >
> > >
> > > Pinpoint customers who are looking for what you sell.
> > > _______________________________________________
> > > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> > --
> > Jeff Squyres
> > Cisco Systems
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
> >
> > Yahoo! oneSearch: Finally, mobile search that gives answers, not
> > web links.
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> Building a website is a piece of cake.
> Yahoo! Small Business gives you all the tools to get online.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
_______________________________________________
This list is archived at http://www.lam-mpi.org/MailArchives/lam/
---------------------------------
Looking for a deal? Find great prices on flights and hotels with Yahoo! FareChase.
|