thanks for ur reply
our program was running properly on cluster built
upon pentium 2 machines. The same program that we
mention early is not running on current platform of
intel HT,p4 machines. If u can test program we can
send it to u.
can u please help us to understand what signal 11
error is?
thanks
--- Jeff Squyres <jsquyres_at_[hidden]> wrote:
> I'll reply the same way as when you mailed me this
> same question directly
> (off-list) 2 days ago:
>
> -----
> The error message you are getting indicates that
> your program is exiting
> with a signal 11. This is a segmentation violation,
> meaning that you
> have an error in your program somewhere; probably
> something to do with a
> bad pointer reference, or something along those
> lines. Hence, this is
> simply a matter of debugging your program to fix
> this problem.
> -----
>
> :-)
>
> More specifically, this is likely not to be a
> problem with LAM, but rather
> a problem in your application. You need to find
> *where* the problem is
> occurring in your application, then figure out *why*
> it is happening, and
> then figure out *how* to fix it. Fire up a debugger,
> look at any corefiles
> that may be generated, and use standard debugging
> techniques.
>
> There is no general answer that anyone can give --
> it's something to do
> specifically with your application.
>
>
> On Fri, 11 Jun 2004, sachin kothe wrote:
>
> > We are from V.N.I.T nagpur(India).
> >
> > We have 4 computers with Intel's HT technology and
> SCSI hard drives and
> > 2GB RAM. We trying to build cluster out of these
> machines(one server and
> > other nodes). It is working for hello.c and matrix
> multiplication( which
> > was downloded from internet).Then we ran
> monte-carlo application on this
> > cluster. This program is compiling properly but on
> running with "mpirun
> > C <program_name>" ,it gives following message.
> >
>
----------------------------------------------------------------------
> > "One of the processes started by mpirun has exited
> with a nonzero exit
> > code. This typically indicates that the process
> finished in error.If
> > your process did not finish in error, be sure to
> include a return 0 or
> > exit(0) in your C code before exiting the
> application.
> >
> > PID 4628 failed on node n1 (192.168.3.49) due to
> signal 11. "
> >
>
----------------------------------------------------------------------
>
> > we are stuck at this point and unable to proceed
> further. The same error
> > is specified in your manual under troubleshooting
> section but the
> > solution is not provided. So what to do in such
> situation? Please
> > provide us with solution to this problem.
>
> --
> {+} Jeff Squyres
> {+} jsquyres_at_[hidden]
> {+} http://www.lam-mpi.org/
> _______________________________________________
> This list is archived at
http://www.lam-mpi.org/MailArchives/lam/
__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/
|