LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Shreyas Bhatewara (pictguys_at_[hidden])
Date: 2005-07-15 01:06:23


--- Jeff Squyres <jsquyres_at_[hidden]> wrote:
Hello Jeff,
Thanx for helping, but that is not the case. I have
installed 7.1.1 on both the nodes.

After I posted this mail I tried running the test
suite. All the tests PASSed except two final.c and
abort.c, which are expected and normal to fail. Also
all the other examples applications eg:wave,
mandelbrot run perfectly well except two applications,
namely alltoall(in the alltoall folder) and
chapter_10(in the cxx folder). These two faulty
applications give the error messages which I described
in the earlier mail.

The PATHs and laminfo for interactive and
noninteractive from each node to all other nodes is
exactly same (the version numbers are same).

So finally, all example applications and test suite is
running fine except two applications alltoall and
chapter_10. Is there any problem with these two
applications in particular? Do they have any specific
requirements?

I am appending some of the commands which I tried to
verify the version and PATH mismatches.

******************** PATH ***********************
[gscluster_at_seine gscluster]694:) rsh prawra echo $PATH
connect to address 192.168.0.171: Connection refused
Trying krb4 rsh...
connect to address 192.168.0.171: Connection refused
trying normal rsh (/usr/bin/rsh)
Changes to path done
/home/gscluster/lam/bin:/usr/kerberos/bin:/usr/local/bin:/usr/bin:/bin:/usr/X11R6/bin:/home/gscluster/bin
[gscluster_at_seine gscluster]695:) rsh prawra
connect to address 192.168.0.171: Connection refused
Trying krb4 rlogin...
connect to address 192.168.0.171: Connection refused
trying normal rlogin (/usr/bin/rlogin)
Last login: Fri Jul 15 10:58:22 from seine.gs-lab.com
Changes to path done
[gscluster_at_prawra ~]$ echo $PATH
/home/gscluster/lam/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin:/home/gscluster/bin

******************** laminfo ******************
[gscluster_at_seine gscluster]715:) rsh prawra laminfo
connect to address 192.168.0.171: Connection refused
Trying krb4 rsh...
connect to address 192.168.0.171: Connection refused
trying normal rsh (/usr/bin/rsh)
Changes to path done
             LAM/MPI: 7.1.1
              Prefix: /home/gscluster/lam
        Architecture: i686-pc-linux-gnu
       Configured by: gscluster
       Configured on: Tue Jul 12 18:01:07 IST 2005
      Configure host: prawra.gs-lab.com
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: cc
        C++ compiler: g++
    Fortran compiler: f77
     Fortran symbols: double_underscore
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI boot: globus (API v1.1, Module v0.6)
            SSI boot: rsh (API v1.1, Module v1.1)
            SSI boot: slurm (API v1.1, Module v1.0)
            SSI coll: lam_basic (API v1.1, Module
v7.1)
            SSI coll: shmem (API v1.1, Module v1.0)
            SSI coll: smp (API v1.1, Module v1.2)
             SSI rpi: crtcp (API v1.1, Module v1.1)
             SSI rpi: lamd (API v1.0, Module v7.1)
             SSI rpi: sysv (API v1.0, Module v7.1)
             SSI rpi: tcp (API v1.0, Module v7.1)
             SSI rpi: usysv (API v1.0, Module v7.1)
              SSI cr: self (API v1.0, Module v1.0)
[gscluster_at_seine gscluster]716:) rsh prawra
connect to address 192.168.0.171: Connection refused
Trying krb4 rlogin...
connect to address 192.168.0.171: Connection refused
trying normal rlogin (/usr/bin/rlogin)
Last login: Fri Jul 15 11:17:16 from seine
Changes to path done
[gscluster_at_prawra ~]$ laminfo
             LAM/MPI: 7.1.1
              Prefix: /home/gscluster/lam
        Architecture: i686-pc-linux-gnu
       Configured by: gscluster
       Configured on: Tue Jul 12 18:01:07 IST 2005
      Configure host: prawra.gs-lab.com
      Memory manager: ptmalloc2
          C bindings: yes
        C++ bindings: yes
    Fortran bindings: yes
          C compiler: cc
        C++ compiler: g++
    Fortran compiler: f77
     Fortran symbols: double_underscore
         C profiling: yes
       C++ profiling: yes
   Fortran profiling: yes
      C++ exceptions: no
      Thread support: yes
       ROMIO support: yes
        IMPI support: no
       Debug support: no
        Purify clean: no
            SSI boot: globus (API v1.1, Module v0.6)
            SSI boot: rsh (API v1.1, Module v1.1)
            SSI boot: slurm (API v1.1, Module v1.0)
            SSI coll: lam_basic (API v1.1, Module
v7.1)
            SSI coll: shmem (API v1.1, Module v1.0)
            SSI coll: smp (API v1.1, Module v1.2)
             SSI rpi: crtcp (API v1.1, Module v1.1)
             SSI rpi: lamd (API v1.0, Module v7.1)
             SSI rpi: sysv (API v1.0, Module v7.1)
             SSI rpi: tcp (API v1.0, Module v7.1)
             SSI rpi: usysv (API v1.0, Module v7.1)
              SSI cr: self (API v1.0, Module v1.0)
[gscluster_at_prawra ~]$

I want to be sure that the cluster setup is flaw-less.
It is critical for me. Is the 'running of test suite
successfully' a sufficient indicator to correct setup
of LAM-MPI?

Please help.

Regards,
pictguys

> It looks like you have LAM/MPI 7.0.0 installed as
> the default on one
> node, and 7.1.1 as the default on another.
>
> You might want to check differences in your PATH
> between interactive
> and non-interactive logins (i.e., you want to make
> sure that you're
> getting the same version of LAM -- assumedly 7.1.1
> -- for both
> interactive and non-interactive logins). See the
> FAQ under the
> "Booting LAM" section for more details about shell
> startup files, etc.
>
>
>
> On Jul 12, 2005, at 10:55 AM, Shreyas Bhatewara
> wrote:
>
> > I set up a cluster of two nodes by installi LAM
> MPI
> > 7.1.1. Both of the nodes have the same h/w
> > configuration.
> > I used the command
> >> mpirun C -ssi rpi tcp chapter_10
> > and the error I got was this:
> >
>
-----------------------------------------------------------------------
>
> > ------
> > It seems that [at least] one of the processes that
> was
> > started with
> > mpirun chose a different RPI than its peers. For
> > example, at least
> > the following two processes mismatched in their
> RPI
> > selections:
> >
> > MPI_COMM_WORLD rank 1: tcp (v7.0.0)
> > MPI_COMM_WORLD rank 0: tcp (v7.1.0)
> >
> > All MPI processes must choose the same RPI module
> and
> > version when
> > they start. Check your SSI settings and/or the
> local
> > environment
> > variables on each node.
> >
>
-----------------------------------------------------------------------
>
> > ------
> >
> > The laminfo command shows same version of the tcp
> rpi
> > used on the two nodes, which is 7.1.1.
> > What could be the probable mismatch? how can i set
> it
> > right?
> > I executed the hello application from examples, it
> > works fine. But the other applications do not
> work. I
> > also tried all the rpi modules (lamd, crtcp etc.)
> but
> > results in same kind of error.
> >
> > Please help.
> >
> > Thanx in advance
> > pictguys
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam? Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
> >
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
>
> _______________________________________________
> This list is archived at
> http://www.lam-mpi.org/MailArchives/lam/
>

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com