Linux Applications Debugging Techniques/Core files

A core dump is a snapshot of the memory of the program, processor registers including program counter and stack pointer and other OS and memory management information, taken at a certain point in time. As such, they are invaluable for capturing the state of rare occurring races and abnormal conditions.

What is more, such rarities will be found usually on heavily used production or QA machines where gdb is not available, nor is access easy to the machine. Worse, the heaviest users are usually the biggest clients (moneywise...). As such, it is important to get as much forensic data as available, and plan for it.

One can force a core dump from within the program or from outside at chosen moments. What a core cannot tell is how the application ended up in that state: the core is no replacement for a good log. Verbose logs and core files go hand in glove.

Prerequisites
For a process to be able to dump core, a few prerequisites have to be met:


 * the set core size limit should permit it (see the man page for ulimit). E.g.: ulimit -c unlimited. It can also be set from within the program.
 * the process to dump core should have write permissions to the folder where the core is to be dumped to (usually the current working directory of the process)

Where is my core?
Usually the core is dumped in the current working directory of the process. But the OS can be configured otherwise:

Dumping core from outside the program
One possibility is with gdb, if available. This will let the program running:

Another possibility is to signal the process. This will terminate it, assuming the signal is not caught by a custom signal handler:

Dumping core from within the program
Again, there are two possibilities: dump core and terminate the program or dump and continue:

Note: use dump_core_and_continue with care: in a multi-threaded program, the forked child will have only a clone of the parent thread that called fork [Butenhof Ch5; re: threads & fork]. This has number of implications, in particular with respect to mutexes, but the particular point here is that the core that the child will dump will contain information only for one thread. If you need to dump a core with all threads without aborting the process, try to use the google core dumper library, even if it has not been maintained for years.

Shared libraries
To obtain a good call stack, it is important that the gdb loads the same libraries that were loaded by the program that generated the core dump. If the machine we are analyzing the core has different libraries (or has them in different places) from the machine the core was dumped, then copy over the libraries to the analyzing machine, in a way that mirrors the dump machine. For instance:

At the gdb prompt:

Source Code
To point the debugger to the source files:

analyze-cores
Here is a script that will generate a basic report per core file. Useful the days when cores are raining on you:

An alternative worth exploring is btparser.

Canned user-defined commands
Same reporting functionality can be canned for gdb:

analyze-pid
A script that will generate a basic report and a core file for a running process:

Thread Local Storage
TLS data is rather difficult to access with gdb in the core files, and __tls_get_addr cannot be called.

Links

 * crash tool
 * Kernel dumps