I'll reply the same way as when you mailed me this same question directly
(off-list) 2 days ago:
-----
The error message you are getting indicates that your program is exiting
with a signal 11. This is a segmentation violation, meaning that you
have an error in your program somewhere; probably something to do with a
bad pointer reference, or something along those lines. Hence, this is
simply a matter of debugging your program to fix this problem.
-----
:-)
More specifically, this is likely not to be a problem with LAM, but rather
a problem in your application. You need to find *where* the problem is
occurring in your application, then figure out *why* it is happening, and
then figure out *how* to fix it. Fire up a debugger, look at any corefiles
that may be generated, and use standard debugging techniques.
There is no general answer that anyone can give -- it's something to do
specifically with your application.
On Fri, 11 Jun 2004, sachin kothe wrote:
> We are from V.N.I.T nagpur(India).
>
> We have 4 computers with Intel's HT technology and SCSI hard drives and
> 2GB RAM. We trying to build cluster out of these machines(one server and
> other nodes). It is working for hello.c and matrix multiplication( which
> was downloded from internet).Then we ran monte-carlo application on this
> cluster. This program is compiling properly but on running with "mpirun
> C <program_name>" ,it gives following message.
> ----------------------------------------------------------------------
> "One of the processes started by mpirun has exited with a nonzero exit
> code. This typically indicates that the process finished in error.If
> your process did not finish in error, be sure to include a return 0 or
> exit(0) in your C code before exiting the application.
>
> PID 4628 failed on node n1 (192.168.3.49) due to signal 11. "
> ----------------------------------------------------------------------
> we are stuck at this point and unable to proceed further. The same error
> is specified in your manual under troubleshooting section but the
> solution is not provided. So what to do in such situation? Please
> provide us with solution to this problem.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|