LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Vishal Sahay (vsahay_at_[hidden])
Date: 2004-03-30 19:20:40


Hi --

Even if there are problems with NFS, it is unlikely that you should get a
segfault (signal 11). Can you find out something from your core (dumped)
file, or send it across.

Can you try out tests under /<lam_source_dir>/romio/test to see if
problems arise there. Also verify if this is the only ROMIO installation
and you dont have a broken/other version of ROMIO. LAM has tweaked ROMIO
for MPI suitability, so this is the only one it should use, other
versions would cause problems.

-Vishal

On Mon, 29 Mar 2004, Tod Hagan wrote:

# Testing my build of lam-7.0.4 with lamtests-7.0.4 fails just one test,
# for file_status_get_count.
#
# This is on RHEL 3:
#
# Linux 2.4.21-9.ELsmp #1 SMP Thu Jan 8 17:08:56 EST 2004 i686 i686 i386 GNU/Linux
#
# I couldn't use the stock RedHat version, as I needed to build with
# support for the Portland Group pgf77 Fortran compiler we must use.
#
# Searching the archives turned up this failure being caused by NFS
# problems. Is this the only test that relies on NFS?
#
# We don't need this capability right now (our current application runs
# okay under LAM and apparently doesn't use this feature), but in the long
# run I'd like to have it pass all the tests.
#
# Any suggestions?
#
# Thanks in advance.
#
# > make check-TESTS
# > make[2]: Entering directory `/usr/local/src/lamtests-7.0.4/io'
# > mpirun -x TEST -s h C -ssi rpi sysv /usr/local/src/lamtests-7.0.4/io/./file_status_get_count
# > -----------------------------------------------------------------------------
# > One of the processes started by mpirun has exited with a nonzero exit
# > code. This typically indicates that the process finished in error.
# > If your process did not finish in error, be sure to include a "return
# > 0" or "exit(0)" in your C code before exiting the application.
# >
# > PID 29156 failed on node n2 (10.0.0.118) due to signal 11.
# > -----------------------------------------------------------------------------
# > mpirun -x TEST -s h C -ssi rpi usysv /usr/local/src/lamtests-7.0.4/io/./file_status_get_count
# > -----------------------------------------------------------------------------
# > One of the processes started by mpirun has exited with a nonzero exit
# > code. This typically indicates that the process finished in error.
# > If your process did not finish in error, be sure to include a "return
# > 0" or "exit(0)" in your C code before exiting the application.
# >
# > PID 29161 failed on node n2 (10.0.0.118) due to signal 11.
# > -----------------------------------------------------------------------------
# > mpirun -x TEST -s h C -ssi rpi tcp /usr/local/src/lamtests-7.0.4/io/./file_status_get_count
# > -----------------------------------------------------------------------------
# > One of the processes started by mpirun has exited with a nonzero exit
# > code. This typically indicates that the process finished in error.
# > If your process did not finish in error, be sure to include a "return
# > 0" or "exit(0)" in your C code before exiting the application.
# >
# > PID 29164 failed on node n2 (10.0.0.118) due to signal 11.
# > -----------------------------------------------------------------------------
# > FAIL: file_status_get_count
# > ===================
# > 1 of 1 tests failed
# > ===================
# > make[2]: *** [check-TESTS] Error 1
# > make[2]: Leaving directory `/usr/local/src/lamtests-7.0.4/io'
# > make[1]: *** [check-am] Error 2
#
#
# _______________________________________________
# This list is archived at http://www.lam-mpi.org/MailArchives/lam/
#