LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-04-17 17:32:33


It certainly looks like an application error (signal 11). Can't say
for sure, of course, but what you can say for sure is that LAM is
telling you that at least one process died due to a seg fault.

On Apr 17, 2007, at 2:13 PM, Jeffrey B. Layton wrote:

> Good afternoon,
>
> We're using a commercial application (XFDTD) that sometimes we
> get the following error message:
>
> ----------------------------------------------------------------------
> -------
> One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.
> If your process did not finish in error, be sure to include a "return
> 0" or "exit(0)" in your C code before exiting the application.
>
> PID 16864 failed on node n0 (172.31.1.1) due to signal 11.
> ----------------------------------------------------------------------
> -------
>
> LAM 7.1.2/MPI 2 C++/ROMIO - Indiana University
>
> Shutting down LAM
> hreq: waiting for HALT ACKs from remote LAM daemons
> hreq: received HALT_ACK from n1 (n02)
> hreq: received HALT_ACK from n2 (n03)
> hreq: received HALT_ACK from n3 (n04)
> hreq: received HALT_ACK from n0 (n01)
> lamhalt: sleeping to wait for lamds to die
> LAM halted
>
>
>
> I assume this is an application error - is this correct? (I just want
> to be absolutely sure).
>
> Thanks!
>
> Jeff
>
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/

-- 
Jeff Squyres
Cisco Systems