I think what Anju was trying to say is that you might want to check your
application for errors. While we never claim that LAM is 100% bug-free,
this doesn't *seem* to have to do with LAM. LAM will do the translation
for you when sending between heterogeneous machines, but it should be
fairly easy to ensure that what was sent between opposite-endian machines
is what is actually received. If you find that LAM is screwing that up,
please be sure to let us know.
Also be aware of the ordering issues of floating point computations. For
example, for some operations, (A * B * C) is not always the same as (C * B
* A). This is not really a LAM issue, but more of a how-computers-work
issue. You might want to see if your code preserves the ordering properly
when you spread across multiple processes.
Otherwise, you might want to use a memory-checking debugger such as
valgrind, bcheck (Solaris), or purify (costs money) to see if anything
obvious jumps out at you.
Hope that helps.
On Wed, 23 Jun 2004, Angel Tsankov wrote:
> I my original post I used "CPU" to mean process, and "CPUs" - processes
> (where each one runs on a different node, i.e. using the -c argument of
> mpirun).
>
> ----- Original Message -----
> From: "Prabhanjan Kambadur" <pkambadu_at_[hidden]>
> To: "General LAM/MPI mailing list" <lam_at_[hidden]>
> Sent: Wednesday, June 23, 2004 1:58 AM
> Subject: Re: LAM: multiple runs - different results
>
>
>>
>> Sorry for the delay. Ideally, it should not matter as to whether you are
>> running on 1, 2 or 4 CPUs. Increasing the number of CPUs is usually done
>> for the purposes increasing the turnaround time and/or make better use of
>> the available hardware (only in well written applications). When you say
>> that you are getting errorneous results with 2 or 4 CPUs with the largest
>> problem size, are you still running the same number of processes as you
>> did in the single CPU run? This would rule out application coding errors.
>> Could you please give more details regarding this?
>>
>> Regards,
>> Anju
>>
>> On Sun, 20 Jun 2004, Angel Tsankov wrote:
>>
>>> The discrete size of a problem that I'm trying to solve can take 4
> different
>>> values (let's say these are 16, 32, 64, 128). I'm written a C++ program
> to
>>> perform the appropriate computations. The program is expected to be run
> on
>>> 1, 2 or 4 CPUs.
>>>
>>> When the app is run to solve either the 16-, 32- or 64-size problem it
>>> returns the expected results no matter the number of CPUs used. When the
>>> program is run on a single CPU to solve the 128-size problem it also
> return
>>> the expected results. Surprisingly, I get unexpected results only with
> the
>>> largest problem size on 2 and 4 CPUs.
>>> The program transfers arrays of doubles using MPI_Irecv, MPI_Issend and
>>> MPI_Waitany.
>>>
>>> Does anyone have an idea what the problem could be?
>>>
>>> Since the cluster is homogeneous, I've also tried transferring the
> arrays of
>>> doubles as arrays of bytes (with as many more elements as is the value
> of
>>> sizeof( double )). This was to check if LAM performs some conversions
> that
>>> could result in loss of precision. Unfortunately, this did not help.
>>> I'm investigating the issue further, but its is a bit difficult to debug
> a
>>> program that solve a problem of that size. In fact, I've tried to
> implement
>>> the steepest descent algorithm for a block-tridiagonal matrix of size
>>> 128x128, where each element is a 128x128 matrix of doubles.
>>>
>>> _______________________________________________
>>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>>
>> _______________________________________________
>> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|