LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2003-08-05 10:32:35


On Mon, 4 Aug 2003, Bill Wendling wrote:

> I was wondering if there's anything wrong with this sample program:
>
> [snipped]

I think there's at least one thing wrong -- you're copying the file before
closing it.

> It outputs only "NOT" at offset 12 in the file and not the "hello world"
> before it. I'm not sure what's wrong with the code. Does anyone have any
> ideas?

Also, I think you're violating a few assumptions. Here's a few points
(ROMIO authors please correct me if I get any of these wrong!):

1. /tmp is unlikely to be NFS shared. I'm guessing that you have ROMIO
compiled for UFS support, which means that ROMIO thinks it is writing to a
shared file. Since /tmp is not shared, ROMIO is unaware that it is
writing to separate files.

2. The file sizes are actually correct, and you are writing to the correct
offsets (check the output of "od zzz.h5" and you'll see).

3. Since you're doing a system("cp ..."), what I'm guessing is happening
is that you're actually showing a race condition: per point 1, each MPI
process is writing to their own /tmp/zzz.h5 file, and then each of them
execute the cp to copy to zzz.h5 in the local (assumedly shared)
directory. One of them will win -- and it seems like your MPI_COMM_WORLD
rank 1 process is typically the winner. But even so, this is a race
condition and you should avoid it.

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/