The discrete size of a problem that I'm trying to solve can take 4 different
values (let's say these are 16, 32, 64, 128). I'm written a C++ program to
perform the appropriate computations. The program is expected to be run on
1, 2 or 4 CPUs.
When the app is run to solve either the 16-, 32- or 64-size problem it
returns the expected results no matter the number of CPUs used. When the
program is run on a single CPU to solve the 128-size problem it also return
the expected results. Surprisingly, I get unexpected results only with the
largest problem size on 2 and 4 CPUs.
The program transfers arrays of doubles using MPI_Irecv, MPI_Issend and
MPI_Waitany.
Does anyone have an idea what the problem could be?
Since the cluster is homogeneous, I've also tried transferring the arrays of
doubles as arrays of bytes (with as many more elements as is the value of
sizeof( double )). This was to check if LAM performs some conversions that
could result in loss of precision. Unfortunately, this did not help.
I'm investigating the issue further, but its is a bit difficult to debug a
program that solve a problem of that size. In fact, I've tried to implement
the steepest descent algorithm for a block-tridiagonal matrix of size
128x128, where each element is a 128x128 matrix of doubles.
|