Sorry for the delay. Ideally, it should not matter as to whether you are
running on 1, 2 or 4 CPUs. Increasing the number of CPUs is usually done
for the purposes increasing the turnaround time and/or make better use of
the available hardware (only in well written applications). When you say
that you are getting errorneous results with 2 or 4 CPUs with the largest
problem size, are you still running the same number of processes as you
did in the single CPU run? This would rule out application coding errors.
Could you please give more details regarding this?
Regards,
Anju
On Sun, 20 Jun 2004, Angel Tsankov wrote:
> The discrete size of a problem that I'm trying to solve can take 4 different
> values (let's say these are 16, 32, 64, 128). I'm written a C++ program to
> perform the appropriate computations. The program is expected to be run on
> 1, 2 or 4 CPUs.
>
> When the app is run to solve either the 16-, 32- or 64-size problem it
> returns the expected results no matter the number of CPUs used. When the
> program is run on a single CPU to solve the 128-size problem it also return
> the expected results. Surprisingly, I get unexpected results only with the
> largest problem size on 2 and 4 CPUs.
> The program transfers arrays of doubles using MPI_Irecv, MPI_Issend and
> MPI_Waitany.
>
> Does anyone have an idea what the problem could be?
>
> Since the cluster is homogeneous, I've also tried transferring the arrays of
> doubles as arrays of bytes (with as many more elements as is the value of
> sizeof( double )). This was to check if LAM performs some conversions that
> could result in loss of precision. Unfortunately, this did not help.
> I'm investigating the issue further, but its is a bit difficult to debug a
> program that solve a problem of that size. In fact, I've tried to implement
> the steepest descent algorithm for a block-tridiagonal matrix of size
> 128x128, where each element is a 128x128 matrix of doubles.
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|