Linux Applications Debugging Techniques/Leaks

What to look for
Memory can be allocated through many API calls:


 * 1) malloc
 * 2) calloc
 * 3) realloc
 * 4) memalign
 * 5) posix_memalign
 * 6) valloc
 * 7) mmap
 * 8) brk / <tt>sbrk</tt>

To return memory to the OS:
 * 1) <tt>free</tt>
 * 2) <tt>munmap</tt>

Valgrind
Valgrind should be the first stop for any memory related issue. However:


 * 1) it slows down the program by at least one order of magnitude. In particular server C++ programs can be slowed down 15-20 times.
 * 2) from experience, some versions might have difficulties tracking <tt>mmap</tt> allocated memory.
 * 3) on amd64, the vex disassembler is likely to fail (v3.7) rather sooner than later so valgrind is of no use for any medium or intensive usage.
 * 4) you need to write suppressions to filter down the issues reported.

If these are real drawbacks, lighter solutions are available.

mudflap

 * http://gcc.gnu.org/wiki/Mudflap_Pointer_Debugging

Note: Mudflap has been removed from GCC 4.9 and later

memleax

 * memleax

Memleax debugs memory leak of a running process by attaching it, without recompiling program or restarting target process!!!

It's very convenient and suitable for production environment.

It works on GNU/Linux-x86_64 and FreeBSD-amd64.

gcc/clang flags
See https://lemire.me/blog/2016/04/20/no-more-leaks-with-sanitize-flags-in-gcc-and-clang/

bcc
See https://github.com/iovisor/bcc/blob/master/tools/memleak.py in https://github.com/iovisor/bcc.

libmemleak

 * libmemleak

Libmemleak finds memory leaks that cause a process to slowly increase the amount of memory it uses, also without the need to recompile the program as it can be LD_PRELOAD-ed when starting the program under test. Unlike valgrind it hardly slows down the process under test. Leaks are reported on a per backtrace basis. This is sometimes very important because often the caller, deeper into the backtrace, is responsible for the leak (not freeing it) while the actual place where the allocation happens wouldn't tell you anything.

It's been tested on GNU/Linux-x86_64.

DIY:libmtrace
The GNU C library comes with a built-in functionality to help detecting memory issues: <tt>mtrace</tt>. One of the shortcomings: it does not log the call stacks of the memory allocations it tracks. We can build an interposition library to augment <tt>mtrace</tt>.

The basics
The malloc implementation in the GNU C library provides a simple but powerful way to detect memory leaks and obtain some information to find the location where the leaks occurs, and this, with rather minimal speed penalties for the program.

Getting started is as simple as it can be:


 * <tt>#include mcheck.h</tt> in your code.
 * Call <tt>mtrace</tt> to install hooks for <tt>malloc</tt>, <tt>realloc</tt>, <tt>free</tt> and <tt>memalign</tt>. From this point on, all memory manipulations by these functions will be tracked. Note there are other untracked ways to allocate memory.
 * Call <tt>muntrace</tt> to uninstall the tracking handlers.
 * Recompile.

Under the hood, <tt>mtrace</tt> installs the four hooks mentioned above. The information collected through the hooks is written to a log file.

Note: there are other ways to allocate memory, notably <tt>mmap</tt>. These allocations will not be reported, unfortunately.

Next:
 * Set the <tt>MALLOC_TRACE</tt> environment variable with the memory log name.
 * Run the program.
 * Run the memory log through mtrace.

One of the leaks (the <tt>malloc</tt> call) was precisely traced to the exact file and line number. However, the other leaks at line 25, while detected, we do not know where they occur. The two memory allocations for the <tt>std::string</tt> are buried deep inside the C++ library. We would need the stack trace for these two leaks to pinpoint the place in our code.

We can use GDB (or the trace_call macro) to get the allocations' stacks:

A couple of improvements
It would be good to have <tt>mtrace</tt> itself dump the allocation stack and dispense with gdb. The modified <tt>mtrace</tt> would have to supplement the information with:


 * The stack trace for each allocation.
 * Demangled function names.
 * File name and line number.

Additionally, we can put the code in a library, to free the program from being instrumented with <tt>mtrace</tt>. In this case, all we have to do is interpose the library when we want to trace memory allocations (and pay the performance price). Note: getting all this information at runtime, particularly in a human-readable form will have a performance impact on the program, unlike the plain vanilla <tt>mtrace</tt> supplied with glibc.

The stack trace
A good start would be to use another API function: <tt>backtrace_symbols_fd</tt>. This would print the stack directly to the log file. Perfect for a C program but C++ symbols are mangled:

For C++ we would have to get the stack (<tt>backtrace_symbols</tt>), resolve each address (<tt>dladdr</tt>) and demangle each symbol name (<tt>abi::__cxa_demangle</tt>).

Caveats

 * Memory allocation is one of these basic operation everything else builds on. One needs to allocate memory to load libraries and executables; needs to allocate memory to track memory allocations; and we hook onto it very early in the life of a process: the first pre-loaded library is the memory tracking library. Thus, any API call we make inside this interposition library can reserve surprises, especially in multi-threaded environments.


 * The API functions we use to trace the stack can allocate memory. These allocations are also going through the hooks we installed. As we trace the new allocation, the hooks are activated again and another allocation is made as we trace this new allocation. We will run out of stack in this infinite loop. We break out of this pitfall by using a per-thread flag.


 * The API functions we use to trace the stack can deadlock. Suppose we would use a lock while in our trace. We lock the trace lock and we call <tt>dladdr</tt> which in turn tries to lock a dynamic linker internal lock. If on another thread <tt>dlopen</tt> is called while we trace, dlopen locks the same linker lock, then allocates memory: this will trigger the memory hooks and we now have the <tt>dlopen</tt> thread wait on the trace lock with the linker lock taken. Deadlock.


 * On some platforms (gcc 4.7.2 amd64) TLS calls would trip the memalign hook. This could result in an infinite recursion if the memalign hook in its turn, accesses a TLS variable.

What we got
Let's try again with our new library:

Apparently, not much of an improvement: the summary still does not get us back to line 25 in main.cpp. However, if we search for address 8bf89b0 in the trace log, we find this:

This is good, but having file and line information would be better.

File and line
Here we have a few possibilities:


 * Run the address (e.g. 0x40178450 above) through the <tt>addr2line</tt> tool. If the address is in a shared object that the program loaded, it might not resolve properly.
 * If we have a core dump of the program, we can ask gdb to resolve the address. Or we can attach to the running program and resolve the address.
 * Use the API described here. The downside is that it takes a quite heavy toll on the performance of the program.

Resources

 * The GNU C library manual
 * Using libbfd
 * Linux Programming Toolkit

DIY:libmemleak
Building on libmtrace, we can go one step further and have an interposition library track the memory allocations made by the program. The library generates a report on demand, much like Valgrind does.

Libmemleak is significantly faster than valgrind but also has limited functionality (only leak detection).

Operation
<tt>libmemtrace</tt> has two drawbacks:
 * The log file will quickly reach gigs
 * You are left grepping the log to figure out what leaks when

A better solution would be to have an interposition library to collect memory operations information and to generate a report on-demand.

For <tt>mmap/munmap</tt> we have no choice but hook these directly. Thus, an call from within the application would first hit the hooks in <tt>libmemleak</tt>, then go to <tt>libc</tt>. For <tt>mallocrealloc/memalign/free</tt> we have two options:
 * Use <tt>mtrace/muntrace</tt> as before, to install hooks that will be called from within <tt>libc</tt>. Thus, a <tt>malloc</tt> call would first go through <tt>libc</tt> which will then call the hooks in <tt>libmemtrace</tt>. This leave us at the mercy of <tt>libc</tt>.
 * The second solution is to hook these like <tt>m(un)map</tt>.

The second solution also frees <tt>mtrace/muntrace</tt> for on-demand report generation:
 * A first call to <tt>mtrace</tt> will kick in data collection.
 * Subsequent calls to <tt>mtrace</tt> will generate reports.
 * <tt>muntrace</tt> will stop data collection and will generate a final report.
 * <tt>MALLOC_TRACE</tt> is not needed.

The application can then sprinkle its code with <tt>mtrace</tt> calls at strategic places to avoid reporting much noise. These calls will do nothing in a normal operation as long as <tt>MALLOC_TRACE</tt> is not set. Or, the application can be completely ignorant of the ongoing data collection (no <tt>mtrace</tt> calls within the application code) and <tt>libmemleak</tt> can start collecting as early as being loaded and generate one report upon being unloaded.

To control the <tt>libmemleak</tt> functionality, an environment variable - <tt>MEMLEAK_CONFIG</tt> - has to be set before loading the library:


 * <tt>mtraceinit</tt> will instruct the library to start collecting data upon being loaded. Default is off and the application has to be instrumented with <tt>m(un)trace</tt> calls.

Thus, all the hooks have to do is to call into the reporting:

mallinfo
The <tt>mallinfo</tt> API is rumored to be deprecated. But, if available, it is very useful:

/proc
Coarse grained information can be obtained from /proc:

Various tools

 * gdb-heap extension
 * Dr. Memory
 * Pin tool
 * heaptrack
 * Data-type profiling for perf
 * Type-preserving heap profiler
 * See https://stackoverflow.com/questions/18455698/lightweight-memory-leak-debugging-on-linux for a list of tools. Out of which, memleax deserves a special mention for its out-of-process approach.