LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-07-30 08:38:13


On Mon, 28 Jul 2003, choy hau yan wrote:

> I am using 4 processor to run a fortran program.(2 processor in a CPU.)
> two CPU connect with network.
>
> question 1:
>
> can I know what the name of this architecture. It called shared
> distributed memory?

Sounds more like distributed memory. With MPI, you're using multiple
processes, each with its own unique memory space. Hence, it's distributed
memory.

> question 2:
>
> I am sending a file let say PLK2(400,12,12).
> call mpi_send(PLK2,57600,mpi_real,0,ja,mpi_comm_world, ierr)
>
> I just sending a file from master to slave to run then
> send back to master.
> [snipped]
> When starting the calculation, all the answer from salve is same, but
> after some do loop, I found that the correct answer from slave when send
> to master become not correct.
>
> For example 2-5 value in this array : PLK2(400,12,12) is not the same.
> the value is correct in slave but after receive from master, the number
> become different.

It's hard to tell from your description, but it sounds like you've got
some kind of buffer mismatch in there somewhere.

Try filling your buffers with obvious sentinel values (e.g., have the
master send all 17's or something like that, and ensure that the slave(s)
receive it fine. Then have the slaves send back a buffer filled with
their rank number to the master and see if the master receives it
properly. And so on.

See if this helps find the error.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/