LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-09-10 10:04:55


Are you using the MPI-2 I/O stuff? That would be the only case where
the ADIOI_Set_lock stuff would come into play.

Although applications are generally source compatible between LAM/MPI
and MPICH (they're both implementations of the same standard, after
all), there are slight differences in the implementation. Most of the
time, these things aren't noticable, but sometimes running an MPI
application exclusively under one implementation and then bringing it
over to another implementation can highlight application bugs.

LAM is pretty stable, and while I'm obviously not going to say that it
is guaranteed to be 100% bug free, have you checked your app to ensure
that it doesn't make some MPI assumptions that may be true in MPICH but
aren't true in LAM?

 From the wording of your mail, I can't quite tell what the exact
problem is -- are you just looking at the stdout from mpirun? Or are
your numbers output into files? If you're just looking at stdout, if
you have multiple MPI processes writing to stdout simultaneously, MPI
makes no guarantee about the order in which it is displayed. Indeed,
this is an inherent race condition -- you never know exactly which node
is going to print when, etc. Is this what you're describing?

On Sep 9, 2004, at 8:52 PM, David Kendig wrote:

> I am trying to get LAM-MPI installed and running on an Apple OsX Xserve
> cluster. Everything seems to be installed and working fine HOWEVER
> when
> I compare the output file from the single processor run to the output
> file from the multi processor run using LAM-MPI, the numbers don't
> agree. The MPI run numbers seem to be mixed up and slightly out of
> position.
>
> SO...I ran the test suite and discovered this error:
>
> File locking failed in ADIOI_Set_lock. If the file system is NFS, you
> need to use NFS version 3 and mount the directory with the 'noac'
> option
> (no attribute caching).
>
> Could this error be related? We are using NFS version 3. (I'm not sure
> about the 'noac' option and don't know how to find out on OSX).
> Although NFS is used for the users home space, MPI-LAM is installed
> locally on each of the nodes.
>
> I am out of ideas and don't want to go back to MPICH which does seem to
> work fine.
>
> Help and thanks,
>
> David Kendig
> NASA/GSFC
> Greenbelt, MD
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/