LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-07-07 08:11:53


You seem to be having multiple different errors:

1. a command line problem where mpirun thinks that "sysv" is the
executable to run
2. a problem compiling the sysv RPI
3. a problem getting the sysv RPI to initialize (which I'm not sure
how you got to this point, given #1 and #2)

I think that these errors have gotten mixed up and muddled in the
thread so far. Can you send all the information listed here:

     http://www.lam-mpi.org/using/support/

Specifically, can you show exactly all the steps you are following (to
include all commands) for each error?

Thanks!

On Jul 6, 2008, at 4:06 AM, Endee wrote:

>
>
> 2008/7/5 Endee <nd1977_at_[hidden]>:
> mpirun -ssi rpi_verbose 1 -ssi rpi sysv -np 4 application <in
> >out gives following error report:
>
> n0<17991> ssi:boot:base:linear_windowed: booting n0 (node18)
> n0<17991> ssi:boot:base:linear_windowed: booting n1 (node17)
> n0<17997> ssi:rpi:sysv: module initializing
> n0<17997> ssi:rpi:sysv:pollyield: 1
> n0<17998> ssi:rpi:sysv: module initializing
> n0<17998> ssi:rpi:sysv:pollyield: 1
> n0<17998> ssi:rpi:sysv:short: 8192 bytes
> n0<17997> ssi:rpi:sysv:short: 8192 bytes
> n0<17998> ssi:rpi:sysv:shmpoolsize: 16777216 bytes
> n0<17997> ssi:rpi:sysv:shmpoolsize: 16777216 bytes
> n0<17998> ssi:rpi:sysv:shmmaxalloc: 65536 bytes
> n0<17997> ssi:rpi:sysv:shmmaxalloc: 65536 bytes
> n0<17998> ssi:rpi:tcp:short: 65536 bytes
> n0<17997> ssi:rpi:tcp:short: 65536 bytes
> n1<14838> ssi:rpi:sysv: module initializing
> n1<14838> ssi:rpi:sysv:pollyield: 1
> n1<14838> ssi:rpi:sysv:short: 8192 bytes
>
> -----------------------------------------------------------------------------
> The selected RPI failed to initialize during MPI_INIT. This is a
> fatal error; I must abort.
>
> This occurred on host node18 (n0).
> The PID of failed process was 17997 (MPI_COMM_WORLD rank: 0)
> -----------------------------------------------------------------------------
> n1<14838> ssi:rpi:sysv:shmpoolsize: 16777216 bytes
> n1<14838> ssi:rpi:sysv:shmmaxalloc: 65536 bytes
> n1<14838> ssi:rpi:tcp:short: 65536 bytes
> n1<14837> ssi:rpi:sysv: module initializing
> n1<14837> ssi:rpi:sysv:pollyield: 1
> n1<14837> ssi:rpi:sysv:short: 8192 bytes
> n1<14837> ssi:rpi:sysv:shmpoolsize: 16777216 bytes
> n1<14837> ssi:rpi:sysv:shmmaxalloc: 65536 bytes
> n1<14837> ssi:rpi:tcp:short: 65536 bytes
>
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 17998 failed on node n0 (192.168.101.18) with exit status 1.
>
> Thanks,
> ND
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Cisco Systems