On Sat, 12 Jun 2004, Ganesh Iyer wrote:
> our program was running properly on cluster built upon pentium 2
> machines. The same program that we mention early is not running on
> current platform of intel HT,p4 machines. If u can test program we can
> send it to u.
Unfortunately, I don't have the time to debug individual user programs.
I wish that I could help more, but our help is mostly limited to debugging
problems with LAM itself and providing suggestions where to look for
problems in your application. It's mainly a "number of hours in the day"
kind of issue, combined with a bias of "students should learn the hard
way." :-)
I'm not claiming that LAM is bug-free -- that would be foolish. But seg
faults are *usually* the fault of the application, so that's where one
should start looking. Debugging is about *proving* where the error is
occurring; it can be somewhat of a black art that can best be learned by
doing (IMHO).
Also note that just because your application ran on one platform and
doesn't run on another doesn't mean that the application is correct; there
are many situations where an application can get "lucky" and work for
months before it decides to stop working (sometimes seemingly for no
reason). I think many software engineers on this list would agree. :-)
> can u please help us to understand what signal 11 error is?
Loosely speaking, a segmentation violation is when you try to access
memory that does not belong to you. A related error, the bus error, is
when you try to access an illegal (or nonexistant) address. If you get
either of these two errors, it typically means a problem with pointers or
arrays somewhere in your application.
Add some printf's in your program to find out where it's failing, or,
better yet, use a debugger (perhaps starting with a memory-checking
debugger such as valgrind -- ensure that you configured and installed LAM
with the --with-purify option). Running your program through a
memory-checking debugger can be an enlightening experience; I say this
because we do this all the time to ensure that LAM itself is functioning
properly.
See the LAM FAQ for issues about debugging in parallel. Hope this helps.
--
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/
|