FWIW, LAM buffers by the line. So if you get a segv, you may only be
1 or 2 lines behind (from the faulting process, that is). That is,
LAM's out-of-band signaling that a process has died *usually* travels
at about the same speed of the output from the processes
(specifically: they use the same out-of-band channels).
When I run into problems like this, I usually either add flush()
statements in my C code, or (more preferable), use a debugger. You
might want to attach gdb and/or valgrind and/or your favorite
debugger and run your processes through there -- they you should get
a solid indication of where it is failing.
Check the LAM FAQ for information about debugging in parallel.
On Jan 25, 2006, at 9:42 PM, Tim Prince wrote:
> Adams Samuel D Contr AFRL/HEDR wrote:
>> I am wondering if you guys know how to get unbufferd io to work.
>> I am
>> working on one of our MPI programs, and I am now getting a
>> segmentation
>> fault. I am trying to track down the guilty code with write(0,*)
>> statements which should go to stderr, but the output looks like
>> that it
>> is still being buffered. I presume that the MPI is doing the
>> buffering
>> and when the segmentation fault happens, the MPI stuff still hasn't
>> flushed the buffer. Do you know of any way to manually flush the MPI
>> buffer, or a better way to track down the bug?
>>
> stdout behavior depends on the run-time and compile and run-time
> options
> of the compiler in use, as well as on the characteristics of the MPI.
> Only a few Fortran compilers would connect unit 0 to stderr by
> default.
> "the bug" might consist of the expectation that all Fortran compilers
> and all MPI implementations will behave in a certain way.
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
--
{+} Jeff Squyres
{+} The Open MPI Project
{+} http://www.open-mpi.org/
|