Hi,
I have made a software framework based on LAM/MPI (6.5.9). The software is
used by several people on Linux PC's and on an SGI Onyx2. Recently, we
have had problems with the Onyx2. When running programs, we received error
message like this:
>Avocado Fatal: pfMemory::new() Unable to allocate 236 bytes from arena
0x60004000.
After some time, we discovered that the problem was the reservation of
semaphores and shared memory keys by LAM/MPI, since the rest of the
software framework does not use this.
Each application, created with this software framework, consists of
multiple executable files. In case of a program crash, it appears as
though one of the processes exits without releasing allocated memory.
I have tried to compensate for this, by adding a "signal-handler" to all
process, which uses "psignal" to print an error message, and the
"MPI_Abort" command to stop LAM/MPI. This, however, results in some of the
processes exiting without calling the signal-handler.
I have therefore compensated for this unexpected behaviour by using
"MPI_Finalize" and "exit" instead of "MPI_Abort" in the signal-handler.
This results in only the process that crashes calling the signal-handler.
However, by pressing Ctrl-C, the rest of the processes also calls the
signal-handler.
What is the correct way to deal with this?
Best regards,
Henrik Nagel
--
Henrik Rojas Nagel Phone: +45 9635 9786
Ph.D. student, 3D Visual Data Mining Fax: +45 9815 2444
Lab. of Computer Vision and Media Tech. WWW: http://www.cvmt.dk/~hrn
Aalborg University, Denmark E-mail: hrn_at_[hidden]
|