LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Gaute Lindkvist (lindkvis_at_[hidden])
Date: 2007-01-09 12:28:37


I am currently trying to run a very simple ScalaPACK test program on a
LAM/MPI system, but keep running into a problem with MPI error
messages. If anyone could help me on this, it would be appreciated.

The code fails in BLACS_GRIDINFO, but manages BLACS_PINFO and BLACS_GET fine.

The code, which is simplified from the Scalapack website after the
original failed at the same stage:

    PROGRAM HELLO
* -- BLACS example code --
* Written by Clint Whaley 7/26/94
* Performs a simple check-in type hello world
* ..
* .. External Functions ..
      INTEGER BLACS_PNUM
      EXTERNAL BLACS_PNUM
* ..
* .. Variable Declaration ..
      INTEGER CONTXT, IAM, NPROCS, NPROW, NPCOL, MYPROW, MYPCOL
      INTEGER ICALLER, I, J, HISROW, HISCOL, ERROR
*
* Determine my process number and the number of processes in
* machine
*
* CALL MPI_INIT (ERROR)
      CALL BLACS_PINFO(IAM, NPROCS)
      WRITE (*,*) "Process", IAM+1, "out of", NPROCS
*
* If in PVM, create virtual machine if it doesn't exist
*
      IF (NPROCS .LT. 1) THEN
         IF (IAM .EQ. 0) THEN
            WRITE(*, 1000)
            READ(*, I) NPROCS
         END IF
         CALL BLACS_SETUP(IAM, NPROCS)
      END IF
*
* Set up process grid that is as close to square as possible
*
      NPROW = INT( SQRT( REAL(NPROCS) ) )
      NPCOL = NPROCS / NPROW
*
* Get default system context, and define grid
*
      CALL BLACS_GET(0, 0, CONTXT)
      WRITE (*,*) "Process", IAM+1, "context", CONTXT
      CALL BLACS_GRIDINIT(CONTXT, 'Row', NPROW, NPCOL)
      WRITE (*,*) "Context after gridinit", CONTXT
      CALL BLACS_GRIDINFO(CONTXT, NPROW, NPCOL, MYPROW, MYPCOL)
      WRITE (*,*) "Gridinfo", MYPROW, MYPCOL
* CALL MPI_FINALIZE(error)
 1000 FORMAT ('How many processes in the machine?')
      STOP
      END

The output is:
Process 1 out of 2
Process 1 context 0
MPI_Comm_group: invalid communicator: Invalid argument (rank 0, MPI_COMM_WORLD)
Rank (0, MPI_COMM_WORLD): Call stack within LAM:
Rank (0, MPI_COMM_WORLD): - MPI_Comm_group()
Rank (0, MPI_COMM_WORLD): - main()
 Process 2 out of 2
 Process 2 context 0
MPI_Comm_group: invalid communicator: Invalid argument (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Comm_group()
Rank (1, MPI_COMM_WORLD): - main()

The code is compiled with:
 mpif77 -o hello hello.f libscalapack.a blacsF77init_MPI-LINUX-0.a
blacs_MPI-LINUX-0.a blacsF77init_MPI-LINUX-0.a
/home/lindkvis/Applications/Atlas/lib/libf77blas.a
/home/lindkvis/Applications/Atlas/lib/libatlas.a

On this system:
     LAM/MPI: 7.0.6 Prefix: /usr
      Architecture: x86_64-redhat-linux-gnu
     Configured by: bhcompile
     Configured on: Tue Nov 2 16:57:23 EST 2004
    Configure host: dolly.build.redhat.com
        C bindings: yes
      C++ bindings: yes
  Fortran bindings: yes
       C profiling: yes
     C++ profiling: yes
 Fortran profiling: yes
     ROMIO support: yes
      IMPI support: no
     Debug support: no
      Purify clean: no
          SSI boot: globus (Module v0.5)
          SSI boot: rsh (Module v1.0)
          SSI coll: lam_basic (Module v7.0)
          SSI coll: smp (Module v1.0)
           SSI rpi: crtcp (Module v1.0.1)
           SSI rpi: lamd (Module v7.0)
           SSI rpi: sysv (Module v7.0)
           SSI rpi: tcp (Module v7.0)
           SSI rpi: usysv (Module v7.0)

And run with:
mpirun -np 2 hello

Using the following lamhost (dual core AMD Opteron test system):
localhost cpu=2

I tried adding a MPI_INIT and subsequent MPI_FINALIZE without any
effect. Regular MPI c++ code runs fine on this system.

It uses Atlas for lapack and blas and Scalapack compiled using Atlas.

I apologise if this is not the best mailing list to post to, but I
thought there is probably considerable scalapack experience on this
mailing list.

-- 
Gaute Lindkvist