LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-01-18 22:05:32


LAM uses the ROMIO package to implement its MPI IO functionality.
There is a note in the romio/README file about the use of NFS:

Using ROMIO on NFS
------------------

To use ROMIO on NFS, file locking with fcntl must work correctly on
the NFS installation. On some installations, fcntl locks don't work.
To get them to work, you need to use Version 3 of NFS and
have the system administrator mount the NFS file system with the
"noac" option (no attribute caching). Turning off attribute caching
may reduce performance, but it is necessary for correct behavior.

The following are some instructions we received from Ian Wells of HP
for setting the noac option on NFS. We have not tried them
ourselves. We are including them here because you may find
them useful. Note that some of the steps may be specific to HP
systems, and you may need root permission to execute some of the
commands.

>1. first confirm you are running nfs version 3
>
>rpcnfo -p `hostname` | grep nfs
>
>ie
> goedel >rpcinfo -p goedel | grep nfs
> 100003 2 udp 2049 nfs
> 100003 3 udp 2049 nfs
>
>
>2. then edit /etc/fstab for each nfs directory read/written by MPIO
> on each machine used for multihost MPIO.
>
> Here is an example of a correct fstab entry for /epm1:
>
> ie grep epm1 /etc/fstab
>
> ROOOOT 11>grep epm1 /etc/fstab
> gershwin:/epm1 /rmt/gershwin/epm1 nfs bg,intr,noac 0 0
>
> if the noac option is not present, add it
> and then remount this directory
> on each of the machines that will be used to share MPIO files
>
>ie
>
>ROOOOT >umount /rmt/gershwin/epm1
>ROOOOT >mount /rmt/gershwin/epm1
>
>3. Confirm that the directory is mounted noac:
>
>ROOOOT >grep gershwin /etc/mnttab
>gershwin:/epm1 /rmt/gershwin/epm1 nfs
>noac,acregmin=0,acregmax=0,acdirmin=0,acdirmax=0 0 0 899911504

On Jan 10, 2006, at 12:59 PM, zkis_at_[hidden] wrote:

> Hi,
>
> I am fighting for a while with a problem, and couldn't found a
> solution so
> far. The problem is that my program exits regularly with the error
> message
> pasted at the end of this message. My system is rather new, it
> consists of
> AMD Athlon K7 and Intel Xeon processors, 100Mbit Ethernet
> connections, and
> run (Debian distribution) Linux Kernel 2.6.10-14. I have lam-7.1.1
> installed from a debian package. Beside the LAM MPI libraries I
> also use
> parallel HDF5 in my program, installed from the libhdf5-lam-1.6.2-0
> package + the necessary header files. The strange thing is that
> sometimes
> my program ends correctly, but most of the time it exits with error. I
> have tested the connection between the machines, there is no problem.
> There is no error message in the log files either! No other
> application
> complains, only my MPI programs. The program seems correct, under
> mpich no
> such error occured.
>
> I would very appretiate any suggestion.
>
> Best wishes,
>
> Zsolt Kis
>
> PS: Sorry for double posting!!
>
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> %%%%%%%%
>
>
> zsolt_at_sas:/bird/pool/zsolt$ mpirun -np 15 ppu alma
> File locking failed in ADIOI_Set_lock. If the file system is NFS,
> you need
> to use NFS version 3 and mount the directory with the 'noac' option
> (no
> attribute caching).
> File locking failed in ADIOI_Set_lock. If the file system is NFS,
> you need
> to use NFS version 3 and mount the directory with the 'noac' option
> (no
> attribute caching).
> ----------------------------------------------------------------------
> -------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 23175 failed on node n3 (192.168.1.33) with exit status 1.
> ----------------------------------------------------------------------
> -------
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/