C Programming/stdlib.h/malloc

In computing,   is a subroutine for performing dynamic memory allocation. is part of the standard library and is declared in the  header.

Many implementations of  are available, each of which performs differently depending on the computing hardware and how a program is written. Performance varies in both execution time and required memory. The pointer to memory allocated using malloc must eventually be passed to the  subroutine to deallocate the memory in order to avoid memory leaks.

Rationale
The C programming language manages memory statically, automatically, or dynamically. Static-duration variables are allocated in main (fixed) memory and persist for the lifetime of the program; automatic-duration variables are allocated on the stack and come and go as functions are called and return. For static-duration and, before C99 (which allows variable-length automatic arrays ), automatic-duration variables, the size of the allocation is required to be compile-time constant. If the required size is not known until run-time (for example, if data of arbitrary size is being read from the user or from a disk file), then using fixed-size data objects is inadequate.

The lifetime of allocated memory is also a concern. Neither static- nor automatic-duration memory is adequate for all situations. Automatic-allocated data cannot persist across multiple function calls, while static data persists for the life of the program whether it is needed or not. In many situations the programmer requires greater flexibility in managing the lifetime of allocated memory.

These limitations are avoided by using dynamic memory allocation in which memory is more explicitly (but more flexibly) managed, typically, by allocating it from the heap, an area of memory structured for this purpose. In C, the library function  is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that  returns. When the memory is no longer needed, the pointer is passed to   which deallocates the memory so that it can be used for other purposes.

Some platforms provide library calls which allow run-time dynamic allocation from the C stack rather than the heap (e.g. Unix, Microsoft Windows CRTL's  ). This memory is automatically freed when the calling function ends. The need for this is lessened by changes in the C99 standard, which added support for variable-length arrays of block scope having sizes determined at runtime.

Dynamic memory allocation in C
The  function is one of the functions in standard C to allocate memory. It is just like a array. Its function prototype is:

which allocates  bytes of memory. If the allocation succeeds, a pointer to the block of memory is returned. This pointer is guaranteed to be suitably aligned to any type (including struct and such), otherwise a NULL pointer is returned.

Memory allocated via  is persistent: it will continue to exist, even after the program leaves the scope which called the allocation, until the program terminates or the memory is explicitly deallocated by the programmer. This is achieved by use of the  subroutine. Its prototype is which releases the block of memory pointed to by. must have been previously returned by,  , or   and   must not be used after it has been passed to free. In particular memory allocated via new or new[] should not be passed to free, and pointers which did not come from a malloc (e.g. stack variables) or which have already been freed must not be sent to free. It is safe to call free on a NULL pointer, which has no effect.

Usage example
The standard method of creating an array of 10 int objects: However, if one wishes to allocate a similar array dynamically, the following code could be used:

returns a null pointer to indicate that no memory is available, or that some other error occurred which prevented memory being allocated.

Since there is a possibility that a call to  may fail for lack of sufficient memory, it is often convenient to define a macro that invokes   and exits when   fails. A possible macro definition is: as there is insufficient memory. */ }

The macro given for this would be : as there is insufficient memory. */ \ } So the above basic code can be replaced with just one line as:

A useful idiom with  is shown in this example:

That is, instead of writing a hard-wired type into the argument to malloc, one uses the  operator on the content of the pointer to be allocated. This ensures that the types on the left and right of the assignment will never get out of sync when code is revised.

'C' function for creating & returning a two dimensional array of size of m*n

Casting and type safety
returns a void pointer, which indicates that it is a pointer to a region of unknown data type. The lack of a specific pointer type returned from  is type-unsafe behaviour:   allocates based on byte count but not on type. This distinguishes it from the C++ new operator that returns a pointer whose type relies on the operand. (see C Type Safety).

One may "cast" (see type conversion) this pointer to a specific type:

There are advantages and disadvantages to performing such a cast.

Advantages to casting

 * Including the cast allows for compatibility with C++, which does require the cast to be made.
 * If the cast is present and the type of the left-hand-side pointer is subsequently changed, a warning will be generated to help the programmer in correcting behaviour that otherwise could become erroneous.
 * The cast allows for older versions of  that originally returned a.

Disadvantages to casting

 * Under the ANSI C standard, the cast is redundant.
 * Adding the cast may mask failure to include the header, in which the prototype for   is found. In the absence of a prototype for  , the standard requires that the C compiler assume   returns an  . If there is no cast, a warning is issued when this integer is assigned to the pointer; however, with the cast, this warning is not produced, hiding a bug.  On certain architectures and data models (such as LP64 on 64-bit systems, where   and pointers are 64-bit and   is 32-bit), this error can actually result in undefined behaviour, as the implicitly declared   returns a 32-bit value whereas the actually defined function returns a 64-bit value. Depending on calling conventions and memory layout, this may result in stack smashing. This issue is not present in modern compilers, as they uniformly produce warnings that an undeclared function has been used, so a warning will still appear. For example, gcc's default behaviour is to show a warning that reads "incompatible implicit declaration of built-in function" regardless of whether the cast is present or not.

calloc
returns a block of memory that is allocated for the programmer to use, but is uninitialized. The memory is usually initialized by hand if necessary&mdash;either via the   function, or by one or more assignment statements that dereference the pointer. An alternative is to use the   function, which allocates contiguous memory and then initializes it to zero. Its prototype is which allocates a region of memory, initialized to 0, of size  &times;. This can be useful when allocating an array of characters to hold a string as in the example below:

realloc
It is often useful to be able to shrink or enlarge a block of memory. This can be done using  which returns a pointer to a memory region of the specified size, which contains the same data as the old region pointed to by   (truncated to the minimum of the old and new sizes). If  is able to resize the memory region in place, it allocates new storage, copies the required data, and frees the old pointer. If this allocation fails,  maintains the original pointer unaltered, and returns the null pointer value. In case of expansion, the new region of memory outside the old data that is copied is uninitialized (contents are not predictable). The function prototype is If  is NULL then realloc behaves like malloc of the given  : In both C89 and C99,  with length 0 is a special case. The C89 standard explicitly states that the pointer given is freed, and that the return is either a null pointer or a pointer to the newly allocated space. The C99 standard says that the behavior is implementation-deﬁned. It's possible that malloc and realloc with size 0 return different (null and non-null) pointers. Other standards, such as the Open Group's UNIX standards, make it implementation defined whether  frees   without allocating new space, or possibly frees   and returns a valid pointer to at least zero bytes of memory. Under all standards, NULL can be returned on memory allocation failure. When using, one should always use a temporary variable. For example If instead one did and if it is not possible to obtain  bytes of memory, then p will have value NULL and we no longer have a pointer to the memory previously allocated for p, creating a memory leak (see below).

Common errors
The improper use of  and related functions can frequently be a source of bugs.

Allocation failure
is not guaranteed to succeed&mdash;if there is no memory available, or if the program has exceeded the amount of memory it is allowed to reference,  will return a null pointer, which should always be checked for after allocation. Many badly coded programs do not check for  failure. Such a program would attempt to use the null pointer returned by  as if it pointed to allocated memory. Most likely the program will crash; in some environments, particularly older or smaller platforms that perform no virtual memory management, zero is a valid address so the problem will not be caught.

Memory leaks
When a call to,   or   succeeds, the returned pointer to the allocated memory should eventually be passed to the   function. This releases the allocated memory, allowing it to be reused to satisfy other memory allocation requests. If this is not done, the allocated memory will not be released until the process exits (and in some environments, not even then)&mdash;in other words, a memory leak will occur. Typically, memory leaks are caused by losing track of pointers, for example not using a temporary pointer for the return value of, which may lead to the original pointer being overwritten with a null pointer, for example:

Use after free
After a pointer has been passed to, it becomes a dangling pointer: it references a region of memory with undefined content, which may not be available for use. The pointer's value cannot be accessed. For example: Code like this has undefined behavior: its effect may vary. Actually even trying to read the value of a freed pointer can result in undefined behaviour (here).

Commonly, the system may have reused freed memory for other purposes. Therefore, writing through a pointer to a deallocated region of memory may result in overwriting another piece of data somewhere else in the program. Depending on what data is overwritten, this may result in data corruption or cause the program to crash at a later time. A particularly bad example of this problem is if the same pointer is passed to  twice, known as a double free. 'Use after free' and 'Double free' bugs can lead to security vulnerabilities. To avoid this, some programmers set pointers to  after passing them to  : However, this will not protect other aliases to the same pointer from being misused.

Freeing unallocated memory
Another problem is when  is passed an address that was not allocated by ,   or. This can be caused when a pointer to a literal string or the name of a declared array is passed to, for example: Passing either of the above pointers to  will result in undefined behaviour.

A common error is to free the memory, then use it: This is called “used after freed”. This will run on many systems. This happens when free does not change the contents of the memory being freed. The standard for C does not guarantee this behavior. Thus, it is certain it will fail on some systems.

When a function returns a pointer to allocated memory, the usual practice is to put the pointer returned into a variable, use the memory, then free it using the pointer:

Freeing memory twice
In some programs, memory blocks are freed twice. This is because of confusion that which function is responsible for memory deallocation. Example: This is called "double free" or "multiple free". This code will run on some systems, but may break on the second free.

Implementations
The implementation of memory management depends greatly upon operating system and architecture. Some operating systems supply an allocator for malloc, while others supply functions to control certain regions of data. The same dynamic memory allocator is often used to implement both  and the operator   in C++. Hence, it is referred to below as the allocator rather than.

Heap-based
Implementation of the allocator on IA-32 architectures is commonly done using the heap, or data segment. The allocator will usually expand and contract the heap to fulfill allocation requests.

The heap method suffers from a few inherent flaws, stemming entirely from fragmentation. Like any method of memory allocation, the heap will become fragmented; that is, there will be sections of used and unused memory in the allocated space on the heap. A good allocator will attempt to find an unused area of already allocated memory to use before resorting to expanding the heap. The major problem with this method is that the heap has only two significant attributes: base, or the beginning of the heap in virtual memory space; and length, or its size. The heap requires enough system memory to fill its entire length, and its base can never change. Thus, any large areas of unused memory are wasted. The heap can get "stuck" in this position if a small used segment exists at the end of the heap, which could waste any magnitude of address space, from a few megabytes to a few hundred.

dlmalloc and its derivatives
Doug Lea is the author of a memory allocator called dlmalloc ("Doug Lea's Malloc") whose source code describes itself as:

"This is not the fastest, most space-conserving, most portable, or most tunable malloc ever written. However it is among the fastest  while also being among the most space-conserving, portable and  tunable.  Consistent balance across these factors results in a good  general-purpose allocator for malloc-intensive programs."

The first implementation of dlmalloc was created in 1987. It is written in C and is highly portable, being known to work well on all major operating systems and processor architectures and on systems ranging from medium/small embedded right up to supercomputers. Due to its longevity and open source nature, dlmalloc is widely used for teaching purposes and as a foundation for other allocators, the best known of which is ptmalloc2/ptmalloc3. Since the v2.3 release, the GNU C library (glibc) uses a modified ptmalloc2, which itself is based on dlmalloc v2.7.0.

Another, lesser known dlmalloc derivative is nedmalloc which is based on dlmalloc v2.8.4 and is essentially dlmalloc wrapped by a per-thread lookaside cache to improve execution concurrency.

Memory on the heap is allocated as "chunks", an 8-byte aligned data structure which contains a header and usable memory. Allocated memory contains an 8 or 16 byte overhead for the size of the chunk and usage flags. Unallocated chunks also store pointers to other free chunks in the usable space area, making the minimum chunk size 24-bytes.

Unallocated memory is grouped into "bins" of similar sizes, implemented by using a double-linked list of chunks (with pointers stored in the unallocated space inside the chunk).

For requests below 256 bytes (a "smallbin" request), a simple two power best fit allocator is used. A CPU special op (on GCC this is __builtin_clz) is used to very quickly find the first set bit and a block is returned for the bin corresponding to that top bit position. If there are no free blocks in that bin, a block from the next highest bin is split in two.

For requests of 256 bytes or above but below the mmap threshold, i.e. what dlmalloc calls a "largebin" request, recent versions of dlmalloc use an in-place bitwise trie algorithm. This traverses a binary tree on the basis of each bit state after the first set bit which modern out-of-order CPUs can do very efficiently and which is mostly invariant to the number of allocated blocks. A big advantage of this algorithm is that if it fails to find the size requested, it returns the block with the next biggest size which makes it very easy to split that block into the size required.

For requests above the mmap threshold, the memory is always allocated using the mmap system call. The threshold is 256 KB (1 MB on ptmalloc2) by default, but can be changed by calling the mallopt function. The mmap method averts problems with huge buffers trapping a small allocation at the end after their expiration, but always allocates an entire page of memory, which on many architectures are 4096 bytes in size.

If additional memory below the threshold but less than the available free space needs to be allocated, dlmalloc may use the  call to the Linux kernel to increase the size of the heap. Increasing the size of the heap increases the size of the top-most chunk (wilderness chunk), which is always unallocated, and is treated specially by malloc.

dlmalloc has a fairly weak free space segment coalescer algorithm, mainly because free space coalescing tends to be extremely slow due to causing TLB cache exhaustion. It is called every (by default) 4096 free operations and it works by iterating each of the segments previously requested from the system which were not contiguously returned by the system. It tries to identify large ranges of memory which contain no allocated blocks and breaking its segment into two with the free memory being returned to the system. This algorithm works well if dlmalloc is the sole user of the VM system, however if dlmalloc is used simultaneously with another allocator then dlmalloc's free space coalescer can fail to correctly identify opportunities for free memory release.

glibc implementation of ptmalloc2 differs from dlmalloc in this case as it generally requests extra memory from the kernel using mmap to allocate 1Mb aligned chunks, which its source code refers to as arenas. ptmalloc2 tries to ensure separate arenas per execution thread, thus permitting concurrency within the memory allocator.

ptmalloc3 improves significantly on ptmalloc2 by making the smallbins of dlmalloc (described above) per-thread. This allows ptmalloc3 to offer lock free concurrency for smaller blocks while still allowing separate arenas for largebin allocations. Allocations beyond the mmap threshold still route exclusively through mmap.

nedmalloc is similar to ptmalloc2 in its support of multiple per-thread arenas, but it also adds a separate per-thread lookaside cache for smaller sized blocks which avoids processor serialisation for typical C++ usage patterns much as ptmalloc3 does. nedmalloc, like Hoard referred to below, is able to patch Microsoft Windows binaries to replace the system allocator with itself within a given process. The most recent versions of nedmalloc implement a user mode page allocator which replaces mmap, thus allowing memory pages to be held in a lookaside cache and therefore greatly improving the speed of large allocations.

All of dlmalloc, ptmalloc2, ptmalloc3 and nedmalloc are licensed under an open source licence and are therefore available for student study.

FreeBSD's and NetBSD's
Since FreeBSD 7.0 and NetBSD 5.0, the old  implementation (phkmalloc) was replaced by jemalloc, written by Jason Evans. The main reason for this was a lack of scalability of phkmalloc in terms of multithreading. In order to avoid lock contention, jemalloc uses separate "arenas" for each CPU. Experiments measuring number of allocations per second in multithreading application have shown that this makes it scale linearly with the number of threads, while for both phkmalloc and dlmalloc performance was inversely proportional to the number of threads.

jemalloc is used as the default allocator, in Windows and Linux, of Firefox 3 beta4pre and later instead of the one provided by the operating system, except for Mac OS X. This improves performance and lowers memory consumption, due to less fragmentation.

OpenBSD's
OpenBSD's implementation of the  function makes use of. For requests greater in size than one page, the entire allocation is retrieved using ; smaller sizes are assigned from memory pools maintained by   within a number of "bucket pages," also allocated with. On a call to, memory is released and unmapped from the process address space using. This system is designed to improve security by taking advantage of the address space layout randomization and gap page features implemented as part of OpenBSD's  system call, and to detect use-after-free bugs—as a large memory allocation is completely unmapped after it is freed, further use causes a segmentation fault and termination of the program.

Hoard's
The Hoard memory allocator is an allocator whose goal is scalable memory allocation performance. Like OpenBSD's allocator, Hoard uses  exclusively, but manages memory in chunks of 64 kilobytes called superblocks. Hoard's heap is logically divided into a single global heap and a number of per-processor heaps. In addition, there is a thread-local cache that can hold a limited number of superblocks. By allocating only from superblocks on the local per-thread or per-processor heap, and moving mostly-empty superblocks to the global heap so they can be reused by other processors, Hoard keeps fragmentation low while achieving near linear scalability with the number of threads.

Thread-caching malloc
Every thread has local storage for small allocations. For large allocations mmap or sbrk can be used. Tcmalloc has garbage-collection for local storage of dead threads. The TCmalloc is considered to be more than twice as fast as glibc's ptmalloc for multithreaded programs.

In-kernel
Operating system kernels need to allocate memory just as application programs do. The implementation of  within a kernel often differs significantly from the implementations used by C libraries, however. For example, memory buffers might need to conform to special restrictions imposed by DMA, or the memory allocation function might be called from interrupt context. This necessitates a  implementation tightly integrated with the virtual memory subsystem of the operating system kernel.

In Linux and Unix kmalloc and kfree provide the functionality of malloc and free in the kernel. In Windows drivers, ExAllocatePoolWithTag, ExAllocatePoolWithQuotaTag and ExFreePoolWithTag provide the malloc/free semantic functionality in kernel mode.

Allocation size limits
The largest possible memory block  can allocate depends on the host system, particularly the size of physical memory and the operating system implementation. Theoretically, the largest number should be the maximum value that can be held in a  type, which is an implementation-dependent unsigned integer representing the size of an area of memory. The maximum value is, or the constant   in the C99 standard.