On Wed, 7 Jan 2004, Nguyen Hung Vu wrote:
> I compiled my program rk.c and run, It said "MPI_Send: process in local
> group is dead." and stopped :)
> -----------------------------------------------------------------------------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error. If
> your process did not finish in error, be sure to include a "return 0" or
> "exit(0)" in your C code before exiting the application.
>
> PID 1378 failed on node n0 (xxx.xxx.xxx.xxx) due to signal 8.
> -----------------------------------------------------------------------------
I'm guessing that you're running on Linux...?
If so, Signal 8 is a floating point exception -- meaning that one of your
processes did an illegal floating point operation, and therefore died.
The rest of the MPI application eventually discovered this because at
least one process tried to MPI_Send to the dead process, and failed.
So you need to track down the floating point exception in your code.
If you're not running on Linux, signal 8 may be something else -- you'll
have to look at your local documentation to see what 8 is (it may or may
not be floating point exception).
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|