LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2004-06-09 23:44:12


This is because the topo test asks for 6 processes, and in this particular
test case, it's using the sysv RPI (system V shared memory). The Solaris
default shared memory limits are pretty small, and can't handle how much
shared memory was required for 6 processes (since they're all one one
node). If you had run the tests across more than one node, then less than
6 processes would be run on each node, and therefore, less shared memory
would be required, and the test would likely pass.

FWIW: Solaris' shared memory size values are set in the /etc/system file.
This page has a pretty good explanation of Solaris shared memory:

http://sunsite.uakom.sk/sunworldonline/swol-09-1997/swol-09-insidesolaris.html
My recommendation: don't worry about this failure. If you increase your
shmem limits (see the User docs on the sysv rpi module for how much memory
is required), the test will pass. But if you're never going to run more
than a few processes on a Solaris node, I wouldn't worry about this.

On Wed, 9 Jun 2004, Martyn Klassen wrote:

> I'm running the test suit for lam 7.0.6 on an dual CPU SB2500 running Solaris
> 9 04/04 and using
> Sun Studio 8 compilers. The test is running on a single 2 CPU node. It fails
> on
> the topo tests as shown below.
>
> Making check in topo
> make[1]: Entering directory `/rabi/scratch/lamtests-7.0.6/topo'
> make check-TESTS
> make[2]: Entering directory `/rabi/scratch/lamtests-7.0.6/topo'
> mpirun -x TEST -s h -np 6 -ssi rpi crtcp
> /rabi/scratch/lamtests-7.0.6/topo/./cart
> mpirun -x TEST -s h -np 6 -ssi rpi lamd
> /rabi/scratch/lamtests-7.0.6/topo/./cart
> mpirun -x TEST -s h -np 6 -ssi rpi sysv
> /rabi/scratch/lamtests-7.0.6/topo/./cart
> -----------------------------------------------------------------------------
> -----------------------------------------------------------------------------
> The selected RPI failed to initialize during MPI_INIT. This is a
> fatal error; I must abort.
>
> This occurred on host rabiThe selected RPI failed to initialize during
> MPI_INIT.
> This is a
> (n0).
> The PID of failed process was fatal error; I must abort.
>
> 7358 (MPI_COMM_WORLD rank: 1)
> This occurred on host
> rabi--------------------------------------------------------------------
> --------
> -
> (n0).
> The PID of failed process was 7359 (MPI_COMM_WORLD rank: 2)
> -----------------------------------------------------------------------------
> -----------------------------------------------------------------------------
> The selected RPI failed to initialize during MPI_INIT. This is a
> fatal error; I must abort.
>
> This occurred on host rabi (n0).
> The PID of failed process was 7360 (MPI_COMM_WORLD rank: 3)
> -----------------------------------------------------------------------------
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 7357 failed on node n0 (198.20.40.203) with exit status 1.
> -----------------------------------------------------------------------------
> mpirun -x TEST -s h -np 6 -ssi rpi tcp
> /rabi/scratch/lamtests-7.0.6/topo/./cart
> mpirun -x TEST -s h -np 6 -ssi rpi usysv
> /rabi/scratch/lamtests-7.0.6/topo/./cart
> FAIL: cart
> mpirun -x TEST -s h C -ssi rpi crtcp
> /rabi/scratch/lamtests-7.0.6/topo/./dimscreate
> mpirun -x TEST -s h C -ssi rpi lamd
> /rabi/scratch/lamtests-7.0.6/topo/./dimscreate
> mpirun -x TEST -s h C -ssi rpi sysv
> /rabi/scratch/lamtests-7.0.6/topo/./dimscreate
> mpirun -x TEST -s h C -ssi rpi tcp
> /rabi/scratch/lamtests-7.0.6/topo/./dimscreate
> mpirun -x TEST -s h C -ssi rpi usysv
> /rabi/scratch/lamtests-7.0.6/topo/./dimscreate
> PASS: dimscreate
> mpirun -x TEST -s h -np 4 -ssi rpi crtcp
> /rabi/scratch/lamtests-7.0.6/topo/./graph
> mpirun -x TEST -s h -np 4 -ssi rpi lamd
> /rabi/scratch/lamtests-7.0.6/topo/./graph
> mpirun -x TEST -s h -np 4 -ssi rpi sysv
> /rabi/scratch/lamtests-7.0.6/topo/./graph
> mpirun -x TEST -s h -np 4 -ssi rpi tcp
> /rabi/scratch/lamtests-7.0.6/topo/./graph
> mpirun -x TEST -s h -np 4 -ssi rpi usysv
> /rabi/scratch/lamtests-7.0.6/topo/./graph
> PASS: graph
> mpirun -x TEST -s h -np 6 -ssi rpi crtcp
> /rabi/scratch/lamtests-7.0.6/topo/./sub
> mpirun -x TEST -s h -np 6 -ssi rpi lamd
> /rabi/scratch/lamtests-7.0.6/topo/./sub
> mpirun -x TEST -s h -np 6 -ssi rpi sysv
> /rabi/scratch/lamtests-7.0.6/topo/./sub
> -----------------------------------------------------------------------------
> -----------------------------------------------------------------------------
> The selected RPI failed to initialize during MPI_INIT. This is a
> The selected RPI failed to initialize during MPI_INIT. This is a
> fatal error; I must abort.
> fatal error; I must abort.
>
> This occurred on host
> rabiThis occurred on host (n0).
> The PID of failed process was rabi7715 (n0 (MPI_COMM_WORLD rank: ).
> 1)
> -----------------------------------------------------------------------------
> The PID of failed process was 7716 (MPI_COMM_WORLD rank: 2)
> -----------------------------------------------------------------------------
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 7714 failed on node n0 (198.20.40.203) with exit status 1.
> -----------------------------------------------------------------------------
> mpirun -x TEST -s h -np 6 -ssi rpi tcp
> /rabi/scratch/lamtests-7.0.6/topo/./sub
> mpirun -x TEST -s h -np 6 -ssi rpi usysv
> /rabi/scratch/lamtests-7.0.6/topo/./sub
> FAIL: sub
> ===================
> 2 of 4 tests failed
> ===================
> make[2]: *** [check-TESTS] Error 1
> make[2]: Leaving directory `/rabi/scratch/lamtests-7.0.6/topo'
> make[1]: *** [check-am] Error 2
> make[1]: Target `check' not remade because of errors.
> make[1]: Leaving directory `/rabi/scratch/lamtests-7.0.6/topo'
> make[1]: Entering directory `/rabi/scratch/lamtests-7.0.6'
> make[1]: Nothing to be done for `check-am'.
> make[1]: Leaving directory `/rabi/scratch/lamtests-7.0.6'
> make: *** [check-recursive] Error 1
> make: Target `check' not remade because of errors.
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/