Good morning,
I'm having a problem starting an MPI code that was
built with PGI 6.1 and LAM-7.1.2. I get the following
messages when I try to start the code:
n-1<24201> ssi:boot:base:linear: booting n0 (n2004)
n-1<24201> ssi:boot:base:linear: booting n1 (n2005)
n-1<24201> ssi:boot:base:linear: booting n2 (n2006)
n-1<24201> ssi:boot:base:linear: booting n3 (n2007)
n-1<24201> ssi:boot:base:linear: booting n4 (n2008)
n-1<24201> ssi:boot:base:linear: booting n5 (n2009)
n-1<24201> ssi:boot:base:linear: booting n6 (n2010)
n-1<24201> ssi:boot:base:linear: booting n7 (n2011)
n-1<24201> ssi:boot:base:linear: finished
-----------------------------------------------------------------------------
It seems that [at least] one of the processes that was started with
mpirun chose a different RPI than its peers. For example, at least
the following two processes mismatched in their RPI selections:
MPI_COMM_WORLD rank 0: tcp (v7.1.0)
MPI_COMM_WORLD rank 3: usysv (v7.1.0)
All MPI processes must choose the same RPI module and version when
they start. Check your SSI settings and/or the local environment
variables on each node.
I'm using PBS to start the job and here are the relevant parts of
the script:
NET=tcp
lamboot -b -v -ssi rpi $NET $PBS_NODEFILE
mpirun -O -v C ./${EXE} >> ${OUTFILE}
lamhalt -v
where $EXE and $OUTFILE are defined in the script. Any ideas?
TIA!
Jeff
|