LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeffrey B. Layton (laytonjb_at_[hidden])
Date: 2006-04-11 11:05:54


Good morning,

   I'm having a problem starting an MPI code that was
built with PGI 6.1 and LAM-7.1.2. I get the following
messages when I try to start the code:

n-1<24201> ssi:boot:base:linear: booting n0 (n2004)
n-1<24201> ssi:boot:base:linear: booting n1 (n2005)
n-1<24201> ssi:boot:base:linear: booting n2 (n2006)
n-1<24201> ssi:boot:base:linear: booting n3 (n2007)
n-1<24201> ssi:boot:base:linear: booting n4 (n2008)
n-1<24201> ssi:boot:base:linear: booting n5 (n2009)
n-1<24201> ssi:boot:base:linear: booting n6 (n2010)
n-1<24201> ssi:boot:base:linear: booting n7 (n2011)
n-1<24201> ssi:boot:base:linear: finished
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun chose a different RPI than its peers. For example, at least
the following two processes mismatched in their RPI selections:

    MPI_COMM_WORLD rank 0: tcp (v7.1.0)
    MPI_COMM_WORLD rank 3: usysv (v7.1.0)

All MPI processes must choose the same RPI module and version when
they start. Check your SSI settings and/or the local environment
variables on each node.

I'm using PBS to start the job and here are the relevant parts of
the script:

NET=tcp

lamboot -b -v -ssi rpi $NET $PBS_NODEFILE
mpirun -O -v C ./${EXE} >> ${OUTFILE}
lamhalt -v

where $EXE and $OUTFILE are defined in the script. Any ideas?

TIA!

Jeff