LAM/MPI logo

LAM/MPI General User's Mailing List Archives

  |   Home   |   Download   |   Documentation   |   FAQ   |   all just in this list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2005-05-21 08:39:26


All things being equal, no, they are not big problems.

Note that you are running valgrind against mpirun itself, not your MPI
application. Also note that these are not memory leaks -- they are
blocks that are still in use when the process exits (there's a
difference). So it's not that we're losing memory in mpirun, it's just
that we're not calling free() before exiting.

I agree that in a perfect world with a purist model, mpirun would be
totally valgrind clean. However, the reality is that a) the OS will
free those blocks when mpirun exits, b) it's a constant [small] number
of blocks (relative to the number of processes and a few other
factors), and c) there were always more interesting / important things
to work on in LAM than chasing down the relevant memory blocks to
free(). :-)

The MPI library itself is clean, however, which is much more important
(subject to the disclaimers in the FAQ, which I'm guessing you didn't
read ;-) since you got a "uninitialized" message from valgrind). See
the "Debugging" section of the FAQ, in particular the questions:

- Can I run MPI programs with memory-checking tools such as bcheck,
valgrind, or purify?
- Is LAM purify clean?
- Why does my memory-checking debugger report memory leaks in LAM?
- Why does my memory-checking debugger report "read from uninitialized"
in LAM?

On May 21, 2005, at 9:27 AM, verleyen.wim_at_[hidden] wrote:

> Hello,
>
> I'm curious if the following memory leaks are really bad. Here I will
> show you may valgrind log:
> And if they are serious what can be done against them...I didn't
> checked MPI code yet...
> ==14899== Syscall param write(buf) contains uninitialised or
> unaddressable byte(s)
> ==14899== at 0x1BA2D748: write (in /lib/libc-2.3.2.so)
> ==14899== by 0x805C222: mwrite (in /usr/local/bin/mpirun)
> ==14899== by 0x8055435: _cio_kreqfront (in /usr/local/bin/mpirun)
> ==14899== by 0x805DFAC: kdetach (in /usr/local/bin/mpirun)
> ==14899== Address 0x52BFE73C is on thread 1's stack
> ==14899==
> ==14899== ERROR SUMMARY: 127 errors from 10 contexts (suppressed: 20
> from 2)
> ==14899== malloc/free: in use at exit: 752 bytes in 36 blocks.
> ==14899== malloc/free: 368 allocs, 332 frees, 34298 bytes allocated.
> ==14899== For counts of detected errors, rerun with: -v
> ==14899== searching for pointers to 36 not-freed blocks.
> ==14899== checked 1823500 bytes.
> ==14899==
> ==14899==
> ==14899== 5 bytes in 2 blocks are definitely lost in loss record 1 of
> 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x804DC60: sfh_argv_break_quoted (in
> /usr/local/bin/mpirun)
> ==14899== by 0x804FF4A: parseline (in /usr/local/bin/mpirun)
> ==14899== by 0x804FD62: asc_bufparse (in /usr/local/bin/mpirun)
> ==14899==
> ==14899==
> ==14899== 6 bytes in 2 blocks are definitely lost in loss record 2 of
> 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x804DEB4: sfh_argv_add (in /usr/local/bin/mpirun)
> ==14899== by 0x80507C3: asc_compat (in /usr/local/bin/mpirun)
> ==14899== by 0x804A54D: main (mpirun.c:245)
> ==14899==
> ==14899==
> ==14899== 8 bytes in 1 blocks are still reachable in loss record 3 of
> 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x8063849: lam_ssi_base_module_find (in
> /usr/local/bin/mpirun)
> ==14899== by 0x8057E73: lam_ssi_crlam_base_open (in
> /usr/local/bin/mpirun)
> ==14899== by 0x804ACE2: main (mpirun.c:551)
> ==14899==
> ==14899==
> ==14899== 12 bytes in 1 blocks are still reachable in loss record 4 of
> 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x8050171: parseline (in /usr/local/bin/mpirun)
> ==14899== by 0x804FD62: asc_bufparse (in /usr/local/bin/mpirun)
> ==14899== by 0x804B332: build_app (mpirun.c:742)
> ==14899==
> ==14899==
> ==14899== 12 bytes in 1 blocks are still reachable in loss record 5 of
> 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x80500CE: parseline (in /usr/local/bin/mpirun)
> ==14899== by 0x804FD62: asc_bufparse (in /usr/local/bin/mpirun)
> ==14899== by 0x804B332: build_app (mpirun.c:742)
> ==14899==
> ==14899==
> ==14899== 12 bytes in 1 blocks are definitely lost in loss record 6 of
> 16
> ==14899== at 0x1B9049DF: realloc (vg_replace_malloc.c:197)
> ==14899== by 0x804DF17: sfh_argv_add (in /usr/local/bin/mpirun)
> ==14899== by 0x80593E7: ndi_parse (in /usr/local/bin/mpirun)
> ==14899== by 0x804FFCC: parseline (in /usr/local/bin/mpirun)
> ==14899==
> ==14899==
> ==14899== 16 bytes in 1 blocks are still reachable in loss record 7 of
> 16
> ==14899== at 0x1B9048E1: calloc (vg_replace_malloc.c:176)
> ==14899== by 0x1B91F360: _dlerror_run (in /lib/libdl-2.3.2.so)
> ==14899== by 0x1B91F0B9: dlvsym (in /lib/libdl-2.3.2.so)
> ==14899== by 0x1B9308A1: __errno_location (vg_libpthread.c:2129)
> ==14899==
> ==14899==
> ==14899== 24 bytes in 5 blocks are still reachable in loss record 8 of
> 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x804DC60: sfh_argv_break_quoted (in
> /usr/local/bin/mpirun)
> ==14899== by 0x804FF4A: parseline (in /usr/local/bin/mpirun)
> ==14899== by 0x804FD62: asc_bufparse (in /usr/local/bin/mpirun)
> ==14899==
> ==14899==
> ==14899== 32 bytes in 1 blocks are definitely lost in loss record 9 of
> 16
> ==14899== at 0x1B9049DF: realloc (vg_replace_malloc.c:197)
> ==14899== by 0x804DCC7: sfh_argv_break_quoted (in
> /usr/local/bin/mpirun)
> ==14899== by 0x804FF4A: parseline (in /usr/local/bin/mpirun)
> ==14899== by 0x804FD62: asc_bufparse (in /usr/local/bin/mpirun)
> ==14899==
> ==14899==
> ==14899== 48 bytes in 2 blocks are still reachable in loss record 10
> of 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x80516D1: al_init (in /usr/local/bin/mpirun)
> ==14899== by 0x804F1DB: nid_parse (in /usr/local/bin/mpirun)
> ==14899== by 0x804AF34: build_app (mpirun.c:643)
> ==14899==
> ==14899==
> ==14899== 48 bytes in 2 blocks are still reachable in loss record 11
> of 16
> ==14899== at 0x1B9049DF: realloc (vg_replace_malloc.c:197)
> ==14899== by 0x804DF17: sfh_argv_add (in /usr/local/bin/mpirun)
> ==14899== by 0x8050728: asc_compat (in /usr/local/bin/mpirun)
> ==14899== by 0x804A54D: main (mpirun.c:245)
> ==14899==
> ==14899==
> ==14899== 57 bytes in 10 blocks are still reachable in loss record 12
> of 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x804DEB4: sfh_argv_add (in /usr/local/bin/mpirun)
> ==14899== by 0x805065D: asc_compat (in /usr/local/bin/mpirun)
> ==14899== by 0x804A54D: main (mpirun.c:245)
> ==14899==
> ==14899==
> ==14899== 64 bytes in 1 blocks are still reachable in loss record 13
> of 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x804A8B6: main (mpirun.c:344)
> ==14899==
> ==14899==
> ==14899== 64 bytes in 1 blocks are still reachable in loss record 14
> of 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x804A8A1: main (mpirun.c:343)
> ==14899==
> ==14899==
> ==14899== 144 bytes in 4 blocks are still reachable in loss record 15
> of 16
> ==14899== at 0x1B903EBD: malloc (vg_replace_malloc.c:131)
> ==14899== by 0x805171E: al_append (in /usr/local/bin/mpirun)
> ==14899== by 0x8051441: asc_schedule (in /usr/local/bin/mpirun)
> ==14899== by 0x804B3F6: build_app (mpirun.c:772)
> ==14899==
> ==14899== LEAK SUMMARY:
> ==14899== definitely lost: 55 bytes in 6 blocks.
> ==14899== possibly lost: 0 bytes in 0 blocks.
> ==14899== still reachable: 497 bytes in 29 blocks.
> ==14899== suppressed: 200 bytes in 1 blocks.
>
> Kind regards,
> Wim
>
>
> _______________________________________________
> This list is archived at http://www.lam-mpi.org/MailArchives/lam/
>

-- 
{+} Jeff Squyres
{+} jsquyres_at_[hidden]
{+} http://www.lam-mpi.org/