LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Siegmar Gross (Siegmar.Gross_at_[hidden])
Date: 2008-06-12 11:16:44


Hi,

I have a problem with derived data types and MPI_Scatter/MPI_Gather
(Solaris 10 sparc, LAM-MPI 7.1.4).

I want to distribute the columns of a matrix. At first I wrote a C
program which implemented the derived data type "coltype" and distributed
the columns via MPI_Send/MPI_Recv without problems. Next I modified the
program and used MPI_Scatter/MPI_Gather to distribute and collect the
columns. I implemented "coltype" once more with MPI_Type_struct. The
program didn't work, so I used a 2x2 matrix to figure out what's wrong.
Each process prints its column elements after MPI_Scatter. The process
with rank 1 didn't get the values "2" and "4" (see below), but more or
less 0. Now I used a 4x2 matrix and still a 2-element column (so I should
see the upper 2x2 "matrix" in my columns) to get an idea which values
are used for process 1. As you can see below it got "5" and "7", i.e.
the values of the block which starts just after the first block and not
the values of the block which starts after the first element of the
first block (a[2][0] instead of a[0][1]).

Since I wasn't sure if I could use MPI_Type_struct I rewrote the program
with MPI_Type_vector. This time the result was better but still not
satisfying. Process 1 got values from the second column but one value too
late (starting with a[1][1] instead of a[1][0]).

I assume that I have misunderstood a concept or I have a programming
error in my code, because I run into the same problem with MPICH,
MPICH2, and OpenMPI, and it is not very likely that all implementations
have a bug. Since I dont't know how to proceed, I would be very grateful
if someone could tell me if I must blame myself for the error or if it
is eventually a bug in the implementations of the MPI libraries (how
unlikely it is).

MPI_Type_struct
===============

tyr e5 158 mpicc e5_1a.c
tyr e5 159 mpirun -np 2 a.out

original matrix:

     1 2
     3 4

rank: 0 c0: 1 c1: 3
rank: 1 c0: 5.51719e-313 c1: 4.24399e-314

tyr e5 160 mpicc e5_1a.c
tyr e5 161 mpirun -np 2 a.out

original matrix:

     1 2
     3 4
     5 6
     7 8

rank: 0 c0: 1 c1: 3
rank: 1 c0: 5 c1: 7

MPI_Type_vector
===============

tyr e5 119 mpicc e5_1b.c
tyr e5 120 mpirun -np 2 N a.out

original matrix:

     1 2
     3 4
     5 6
     7 8

rank: 0 c0: 1 c1: 3
rank: 1 c0: 4 c1: 6

Thank you very much for any help or suggestions in advance.

Kind regards

Siegmar



  • TEXT/x-sun-c-file attachment: e5_1a.c

  • TEXT/x-sun-c-file attachment: e5_1b.c