I have a somewhat novel application of openmp in conjunction with MPI
where I attempt to parallelize a blocking MPI send:
subroutine send_slaves(k,a,b)
integer, intent(in) :: k
real(kind=8), dimension(:,:), intent(in), target :: a
real(kind=8), dimension(:), intent(in) :: b
! ...
integer i, ii
real(kind=8), dimension(:), pointer :: pt
! ...
!$omp parallel default(shared) private(i, ii, pt)
!$omp do
do i=1, n_slaves
ii=k+i-1
if (ii>n_rows) cycle
pt=>a(ii,:)
call MPI_SEND(pt, n_cols, MPI_DOUBLE_PRECISION, i, &
ii, MPI_COMM_WORLD, ierr)
enddo
!$omp end do
!$omp end parallel
nullify (pt)
end subroutine send_slaves
This subroutine is accessed by the master process within a do loop as
follows:
do k = 1, n_rows, n_slaves
call send_slaves(k,a,b)
.
.
.
enddo
The quantity "a" is an n_rows by n_cols matrix; "b" is an n_cols
vector. The program is compiled with the openmp flag. Using gfortran
and LAM, this parallelization of the MPI send actually works; running
the master on a two-processor machine (!$ CALL OMP_SET_NUM_THREADS(2))
and placing the slaves on a four-processor machine (mpirun -np 5 ...), I
get about a 30% speedup in the transmission of rows of the matrix to the
slave processes. However, with OPENMP, when I attempt to run this code,
I get a segmentation fault and the job dies. Now I can see a number of
reason where such a parallelization might run afoul of the MPI
standards; the success with LAM might well be a fluke. So was my
successful LAM run serendipitous and I am really afoul of the MPI
standards? Or do the standards even cover this situation and the
successful LAM run is simply a product of the particular MPI implementation?
-- Rich Naff
|