Introduction to Programming Languages/Heap

Heap
In this section we describe heap allocation, but, first, let's understand what is the heap. When the compiler reads the source code of a function, it knows exactly how much space on the stack that function is going to use. The compiler knows how many local variables the function has, the size of its return value, and so on.

When it comes to objects allocated on the heap, though, it's a bit different. Programmers tend to use the heap to allocate objects that depend on user input, therefore whose size is unknown at compilation time. If you ever programmed in C, whenever you used the keyword `malloc` (or `new`, in C++) or its variants, you allocated an object on the heap.

Heap allocation is quite challenging. In the next sections we will describe its main issues.

Placement
The first challenge on heap management has to do with where to put the allocated objects. There are basically three options: first fit, best fit, and worst fit. We will use an analogy to describe them.

Suppose you are going to the mall with family on a weekend. As the parking lot isn't as full as it is on a week day, you have lots of options to choose from. If it's a sunny day and you want to go inside the mall as soon as possible to have an ice cream with the kids, you will probably park on the first available spot you see. On the other hand, if you are not in a hurry for ice cream and wonder whether the parking lot is going to be full when you decide to go home, you will try to park on a spot with the more space available possible for you. Finally, if you are an expert driver and like a challenge, you will look for the spot that fits best the size of your car.

If the Operating System used a first fit policy, it would act like the eager-for-ice-cream driver, allocating objects on the first available spot it finds. If, on the other hand, the OS used a worst fit policy, it would act pretty much like the aware-of-full-parking-lot driver, allocating objects on the spot with as much available space as possible. Finally, the OS would act like the expert driver if it used a best fit policy, allocating objects on the available spot that best fits the objects' size. Usually, Operating Systems use a first fit policy.

Common Errors in Memory Management
Now we will lower the level a bit, letting aside cars, parking lots and whatnot, and focus on C programs.

Memory leak
When a programmer wants to use memory he/she can do it in several ways: one is by declaring static or global variables, in which case the variables will be placed in the static storage; another way is by declaring local variables inside functions, in which case the variables will be placed in the stack; finally, he/she can allocate variables dynamically, in which case the variables will be placed in the heap. In this last case the Operating System allocates space in the heap as the program being executed requests space for variables.

A memory leak occurs every time a programmer allocates memory and does not free that memory. We show an example below of such memory error. In this example, we request space for the variable i, but we do not `free` its space.

To compile the program above we do this:

$> gcc -g Leak.c

And by using valgrind we can find the error. For a more detailed tutorial on valgrind, see this. To find simple errors as the one above, you can just use:

$> valgrind -v ./a.out

Dangling pointer
Another common, but more subtle, memory error is called dangling pointer. Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations.

Dangling pointers arise when an object is deleted or deallocated, without modifying the value of the pointer, so that the pointer still points to the memory location of the deallocated memory. As the system may reallocate the previously freed memory to another process, if the original program then dereferences the (now) dangling pointer, unpredictable behavior may result, as the memory may now contain completely different data.

The following program contains a dangling pointer. Try to inspect it to find the problem.

We will try to explain in more details what is happening in the program above. The figure below shows what happens when line 2) is executed. First, memory is requested for variable i, and a block of memory (the gray one) becomes used, therefore, unless it is freed, the OS has to allocate variables in other blocks.



On line 3) we store a value into the block of memory used by i. The figure below shows it.



After storing a value into the block of memory used by i we release the space it was using. The figure below shows this situation.



Notice that i still points to the block, that is, i still contains the address of that block. Also, the value previously stored in it is still there.

Now, let's stop for a bit and think of what happens on line 6). First, assume we are using a first fit strategy to allocate variables in the heap. From that assumption, as the first block of memory (the one previously used by i) is free, we can allocate j on it. And that's what happens, as we show below.



Now a different value is stored into the block of memory pointed by j:



Finally, on line 8) we print i. Commom sense tells us that 3 or possibly another value would be printed, but now we are able to tell why the value printed is actually 8.