PostgreSQL/Index Hash

PostgreSQL uses two fundamental strategies to implement Hash indices. First, a hash function maps column values of any type and length to a hash value of a fixed size of 32 bit. Such hash values together with the TIDs of their originating rows are the basic bricks for the Hash index. Second, an elaborated algorithm ensures that the size of the index file grows smoothly (that is, in a small amount of pages at one point in time) when additional index entries occur. Hence, it's an extendible hash.

To save space, the hash index doesn't store the original column value but only the computed hash value. This has some implications. The sort order of the computed hash values haven't any relation to the sort order of the original values. Therefore this index type can support only the  operator, but none of the other comparison operators like   or. Additionally, there is the danger of duplicates. Two different column values can create the same hash value. This is unavoidable because there are many more possible column values (any length) than possible hash values (fixed size). Thus, after reading the row according to the found TID, it's necessary to re-evaluate the column value from the heap.

In most cases - especially for long column values - the size of a hash index is smaller than the size of a B-tree index. Also, the execution time of reading and writing commands is often shorter.

The central part of a hash index consists of so-called buckets. They are a double-linked list of pages, where bucket entries are stored. The first page of a bucket can be accessed very fast because of a correlation between its number and certain bits of the hash value.

Description
During the creation and maintaining of a Hash index, several steps are conducted:
 * The column value and the TID of its original row are read.
 * A hash functions computes a hash value out of the original column value. Its size is always 32 bit - independent of its data type and data length.
 * The combination of hash value and TID builds a bucket entry.
 * There are several buckets, where the bucket entries are stored. Certain bits of the hash value determine which bucket owns and receives the bucket entry.
 * Buckets consist of a primary bucket page plus optional overflow bucket pages. Within the bucket, pages are linked together via a double-linked list.
 * If the primary bucket page and all its overflow pages don't offer a free slot for the new bucket entry, a new overflow page is created.
 * The ratio of the number of existing buckets to all bucket entries is computed. Depending on this value and the chosen  of the index, new buckets are created on-demand.



The meta page contains statistical data about the index: number of buckets and bucket entries, an array of links to buckets (hashm_spares), and more.

The primary and overflow bucket pages contain pairs of hash values and TIDs that point to rows of the heap.

The bitmap pages contain an array of bits, which indicates that there may be unused overflow pages (after DELETE operations to the according rows). Those pages can be reused by other buckets.