I'm still chasing the bug - I have no idea where it might be. Thanks a lot
for the advice, any other suggestions are also welcome.
I doubt the ordering of floating point computations might be causing the
problem - I run the same image (executable) on all the CPUs. In fact, I
start the program by issuing "mpirun c0-3 a.out 128". 128 is argument to the
program and is interpreted as the number of blocks in a row/column of the
matrix. Moreover, it is strange that the same image works fine with lower
sizes (e.g. 16,32 or 64). Nevertheless, I will check my code for any sources
of FP ordering problems - it really smells like that.
Just to check this out - does MPI perform any conversions that might cause
loss of precision in a HOMOGENEOUS cluster?
----- Original Message -----
From: "Jeff Squyres" <jsquyres_at_[hidden]>
To: "General LAM/MPI mailing list" <lam_at_[hidden]>
Sent: Wednesday, June 23, 2004 2:57 PM
Subject: Re: LAM: multiple runs - different results
> I think what Anju was trying to say is that you might want to check your
> application for errors. While we never claim that LAM is 100% bug-free,
> this doesn't *seem* to have to do with LAM. LAM will do the translation
> for you when sending between heterogeneous machines, but it should be
> fairly easy to ensure that what was sent between opposite-endian machines
> is what is actually received. If you find that LAM is screwing that up,
> please be sure to let us know.
>
> Also be aware of the ordering issues of floating point computations. For
> example, for some operations, (A * B * C) is not always the same as (C * B
> * A). This is not really a LAM issue, but more of a how-computers-work
> issue. You might want to see if your code preserves the ordering properly
> when you spread across multiple processes.
>
> Otherwise, you might want to use a memory-checking debugger such as
> valgrind, bcheck (Solaris), or purify (costs money) to see if anything
> obvious jumps out at you.
>
> Hope that helps.
>
>
> On Wed, 23 Jun 2004, Angel Tsankov wrote:
>
> > I my original post I used "CPU" to mean process, and "CPUs" - processes
> > (where each one runs on a different node, i.e. using the -c argument of
> > mpirun).
> >
> > ----- Original Message -----
> > From: "Prabhanjan Kambadur" <pkambadu_at_[hidden]>
> > To: "General LAM/MPI mailing list" <lam_at_[hidden]>
> > Sent: Wednesday, June 23, 2004 1:58 AM
> > Subject: Re: LAM: multiple runs - different results
> >
> >
> >>
> >> Sorry for the delay. Ideally, it should not matter as to whether you
are
> >> running on 1, 2 or 4 CPUs. Increasing the number of CPUs is usually
done
> >> for the purposes increasing the turnaround time and/or make better use
of
> >> the available hardware (only in well written applications). When you
say
> >> that you are getting errorneous results with 2 or 4 CPUs with the
largest
> >> problem size, are you still running the same number of processes as you
> >> did in the single CPU run? This would rule out application coding
errors.
> >> Could you please give more details regarding this?
> >>
> >> Regards,
> >> Anju
> >>
> >> On Sun, 20 Jun 2004, Angel Tsankov wrote:
> >>
> >>> The discrete size of a problem that I'm trying to solve can take 4
> > different
> >>> values (let's say these are 16, 32, 64, 128). I'm written a C++
program
> > to
> >>> perform the appropriate computations. The program is expected to be
run
> > on
> >>> 1, 2 or 4 CPUs.
> >>>
> >>> When the app is run to solve either the 16-, 32- or 64-size problem it
> >>> returns the expected results no matter the number of CPUs used. When
the
> >>> program is run on a single CPU to solve the 128-size problem it also
> > return
> >>> the expected results. Surprisingly, I get unexpected results only with
> > the
> >>> largest problem size on 2 and 4 CPUs.
> >>> The program transfers arrays of doubles using MPI_Irecv, MPI_Issend
and
> >>> MPI_Waitany.
> >>>
> >>> Does anyone have an idea what the problem could be?
> >>>
> >>> Since the cluster is homogeneous, I've also tried transferring the
> > arrays of
> >>> doubles as arrays of bytes (with as many more elements as is the value
> > of
> >>> sizeof( double )). This was to check if LAM performs some conversions
> > that
> >>> could result in loss of precision. Unfortunately, this did not help.
> >>> I'm investigating the issue further, but its is a bit difficult to
debug
> > a
> >>> program that solve a problem of that size. In fact, I've tried to
> > implement
> >>> the steepest descent algorithm for a block-tridiagonal matrix of size
> >>> 128x128, where each element is a 128x128 matrix of doubles.
> >>>
> >>> _______________________________________________
> >>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >>>
> >> _______________________________________________
> >> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >>
> >
> > _______________________________________________
> > This list is archived at http://www.lam-mpi.org/MailArchives/lam/
> >
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
|