The error message is telling you that you have different versions of
the lamd RPI (not the lamd executable) on your different machines.
So I think you want to check what versions of LAM you have installed
on each machine. If all else fails, you might want to just
uninstall / reinstall LAM on both machines to guarantee that you have
the same versions.
On Jul 9, 2007, at 12:06 PM, trymelz trymelz wrote:
> Jeff,
>
> Thanks for your information. but...
>
> [Machine_A] rsh Machine_B 'which lamd'
> /usr/bin/lamd
>
> [Machine_B] which lamd
> /usr/bin/lamd
>
> where Machine_A is rank 0 and rank 1, and Machine_B is rank 2
>
> Linfa
>
> Jeff Squyres <jsquyres_at_[hidden]> wrote: It looks like you have a
> version mismatch of LAM/MPI between your two
> nodes. The error message is telling you that it found two different
> versions of the lamd RPI on two nodes:
>
> MPI_COMM_WORLD rank 0: lamd (v7.1.0)
> MPI_COMM_WORLD rank 2: lamd (v7.0.0)
>
> Your laminfo is showing that you have 7.1.3 installed on both nodes,
> but you might want to check for PATH differences on non-interactive
> logins.
>
>
> On Jul 6, 2007, at 4:40 PM, trymelz trymelz wrote:
>
> >
> > Hi,
> >
> > Anyone has an idea about the "mismatched in their RPI selections"
> > problem? Thanks.
> >
> > 1.lamboot -v hostfile3
> >
> > LAM 7.1.3/MPI 2 C++/ROMIO - Indiana University
> >
> > n-1<14838> ssi:boot:base:linear: booting n0 (64-bit Linux machine_A)
> > n-1<14838> ssi:boot:base:linear: booting n1 (32-bit Linux machine_B)
> > n-1<14838> ssi:boot:base:linear: finished
> >
> > 2. mpirun -ssi rpi lamd program
> >
> >
> ----------------------------------------------------------------------
> > -------
> > It seems that [at least] one of the processes that was started with
> > mpirun chose a different RPI than its peers. For example, at least
> > the following two processes mismatched in their RPI selections:
> >
> > MPI_COMM_WORLD rank 0: lamd (v7.1.0)
> > MPI_COMM_WORLD rank 2: lamd (v7.0.0)
> >
> > All MPI processes must choose the same RPI module and version when
> > they start. Check your SSI settings and/or the local environment
> > variables on each node.
> >
> ----------------------------------------------------------------------
> > -------
> >
> > 3. [Machina A]$ rsh Machine_B laminfo
> > LAM/MPI: 7.1.3
> > Prefix: /usr
> > Architecture: i686-pc-linux-gnu
> > Configured by: linfa
> > Configured on: Fri Jul 6 13:12:05 CDT 2007
> > Configure host: Machine_B
> > Memory manager: ptmalloc2
> > C bindings: yes
> > C++ bindings: yes
> > Fortran bindings: yes
> > C compiler: gcc
> > C++ compiler: g++
> > Fortran compiler: g77
> > Fortran symbols: double_underscore
> > C profiling: yes
> > C++ profiling: yes
> > Fortran profiling: yes
> > C++ exceptions: no
> > Thread support: yes
> > ROMIO support: yes
> > IMPI support: no
> > Debug support: no
> > Purify clean: no
> > SSI boot: globus (API v1.1, Module v0.6)
> > SSI boot: rsh (API v1.1, Module v1.1)
> > SSI boot: slurm (API v1.1, Module v1.0)
> > SSI coll: lam_basic (API v1.1, Module v7.1)
> > SSI coll: shmem (API v1.1, Module v1.0)
> > SSI coll: smp (API v1.1, Module v1.2)
> > SSI rpi: crtcp (API v1.1, Module v1.1)
> > SSI rpi: lamd (API v1.0, Module v7.1)
> > SSI rpi: sysv (API v1.0, Module v7.1)
> > SSI rpi: tcp (API v1.0, Module v7.1)
> > SSI rpi: usysv (API v1.0, Module v7.1)
> > SSI cr: self (API v1.0, Module v1.0)
> >
> > 4. [Machina A]$ laminfo
> > LAM/MPI: 7.1.3
> > Prefix: /usr/local
> > Architecture: x86_64-unknown-linux-gnu
> > Configured by: linfa
> > Configured on: Tue Jun 26 16:07:16 CDT 2007
> > Configure host: Machine_A
> > Memory manager: ptmalloc2
> > C bindings: yes
> > C++ bindings: yes
> > Fortran bindings: yes
> > C compiler: /opt/intel/cce/9.0/bin/icc
> > C++ compiler: /opt/intel/cce/9.0/bin/icpc
> > Fortran compiler: /opt/intel/fce/9.0/bin/ifort
> > Fortran symbols: underscore
> > C profiling: yes
> > C++ profiling: yes
> > Fortran profiling: yes
> > C++ exceptions: no
> > Thread support: yes
> > ROMIO support: yes
> > IMPI support: no
> > Debug support: no
> > Purify clean: no
> > SSI boot: globus (API v1.1, Module v0.6)
> > SSI boot: rsh (API v1.1, Module v1.1)
> > SSI boot: slurm (API v1.1, Module v1.0)
> > SSI coll: lam_basic (API v1.1, Module v7.1)
> > SSI coll: shmem (API v1.1, Module v1.0)
> > SSI coll: smp (API v1.1, Module v1.2)
> > SSI rpi: crtcp (API v1.1, Module v1.1)
> > SSI rpi: lamd (API v1.0, Module v7.1)
> > SSI rpi: sysv (API v1.0, Module v7.1)
> > SSI rpi: tcp (API v1.0, Module v7.1)
> > SSI rpi: usysv (API v1.0, Module v7.1)
> > SSI cr: self (API v1.0, Module v1.0)
> >
> >
> > The fish are biting.
> > Get more visitors on your site using Yahoo! Search Marketing.
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
>
> Pinpoint customers who are looking for what you sell.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
Jeff Squyres
Cisco Systems
|