Data Structures/Trees

Trees
A tree is a non-empty set with an element that is designated as the root of the tree while the remaining elements are partitioned into non-empty sets each of which is a subtree of the root.

Tree nodes have many useful properties. The depth of a node is the length of the path (or the number of edges) from the root to that node. The height of a node is the longest path from that node to its leaves. The height of a tree is the height of the root. A leaf node has no children—its only path is up to its parent.

See the axiomatic development of trees and its consequences for more information.

Types of trees:

Binary: Each node has zero, one, or two children. This assertion makes many tree operations simple and efficient.

Binary Search: A binary tree where any left child node has a value less than its parent node and any right child node has a value greater than or equal to that of its parent node.

Traversal
Many problems require we visit the nodes of a tree in a systematic way: tasks such as counting how many nodes exist or finding the maximum element. Three different methods are possible for binary trees: preorder, postorder, and in-order, which all do the same three things: recursively traverse both the left and right subtrees and visit the current node. The difference is when the algorithm visits the current node:

preorder: Current node, left subtree, right subtree (DLR)

postorder: Left subtree, right subtree, current node (LRD)

in-order: Left subtree, current node, right subtree (LDR)

levelorder: Level by level, from left to right, starting from the root node.


 * Visit means performing some operation involving the current node of a tree, like incrementing a counter or checking if the value of the current node is greater than any other recorded.

Sample implementations for Tree Traversal
preorder(node) visit(node) if node.left ≠ null then preorder(node.left) if node.right ≠ null then preorder(node.right)

inorder(node) if node.left ≠ null then inorder(node.left) visit(node) if node.right ≠ null then inorder(node.right)

postorder(node) if node.left ≠ null then postorder(node.left) if node.right ≠ null then postorder(node.right) visit(node)

levelorder(root) queue q q.push(root) while not q.empty do node = q.pop visit(node) if node.left ≠ null then q.push(node.left) if node.right ≠ null then q.push(node.right)

For an algorithm that is less taxing on the stack, see Threaded Trees.

Examples of Tree Traversals


preorder: 50,30, 20, 40, 90, 100 inorder: 20,30,40,50, 90, 100 postorder: 20,40,30,100,90,50

Balancing
When entries that are already sorted are stored in a tree, all new records will go the same route, and the tree will look more like a list (such a tree is called a degenerate tree). Therefore the tree needs balancing routines, making sure that under all branches are an equal number of records. This will keep searching in the tree at optimal speed. Specifically, if a tree with n nodes is a degenerate tree, the longest path through the tree will be n nodes; if it is a balanced tree, the longest path will be log n nodes.

Algorithms/Left rotation: This shows how balancing is applied to establish a priority heap invariant in a Treap, a data structure which has the queueing performance of a heap, and the key lookup performance of a tree. A balancing operation can change the tree structure while maintaining another order, which is binary tree sort order. The binary tree order is left to right, with left nodes' keys less than right nodes' keys, whereas the priority order is up and down, with higher nodes' priorities greater than lower nodes' priorities. Alternatively, the priority can be viewed as another ordering key, except that finding a specific key is more involved.

The balancing operation can move nodes up and down a tree without affecting the left right ordering.

AVL: A balanced binary search tree according to the following specification: the heights of the two child subtrees of any node differ by at most one.

Red-Black Tree: A balanced binary search tree using a balancing algorithm based on colors assigned to a node, and the colors of nearby nodes.

 AA Tree: A balanced tree, in fact a more restrictive variation of a red-black tree.

Binary Search Trees
A typical binary search tree looks like this:



Terms
Node Any item that is stored in the tree. Root The top item in the tree. (50 in the tree above) Child Node(s) under the current node. (20 and 40 are children of 30 in the tree above) Parent The node directly above the current node. (90 is the parent of 100 in the tree above) Leaf A node which has no children. (20 is a leaf in the tree above)

Searching through a binary search tree
To search for an item in a binary tree:


 * 1) Start at the root node
 * 2) If the item that you are searching for is less than the root node, move to the left child of the root node, if the item that you are searching for is more than the root node, move to the right child of the root node and if it is equal to the root node, then you have found the item that you are looking for.
 * 3) Now check to see if the item that you are searching for is equal to, less than or more than the new node that you are on. Again if the item that you are searching for is less than the current node, move to the left child, and if the item that you are searching for is greater than the current node, move to the right child.
 * 4) Repeat this process until you find the item that you are looking for or until the node does not have a child on the correct branch, in which case the tree doesn't contain the item which you are looking for.

Example


For example, to find the node 40...

.........
 * 1) The root node is 50, which is greater than 40, so you go to 50's left child.
 * 2) 50's left child is 30, which is less than 40, so you next go to 30's right child.
 * 3) 30's right child is 40, so you have found the item that you are looking for :)

Adding an item to a binary search tree

 * 1) To add an item, you first must search through the tree to find the position that you should put it in. You do this following the steps above.
 * 2) When you reach a node which doesn't contain a child on the correct branch, add the new node there.

For example, to add the node 25...


 * 1) The root node is 50, which is greater than 25, so you go to 50's left child.
 * 2) 50's left child is 30, which is greater than 25, so you go to 30's left child.
 * 3) 30's left child is 20, which is less than 25, so you go to 20's right child.
 * 4) 20's right child doesn't exist, so you add 25 there :)

Deleting an item from a binary search tree
It is assumed that you have already found the node that you want to delete, using the search technique described above.

Case 1: The node you want to delete is a leaf


For example, to delete 40...


 * Simply delete the node!

Case 2: The node you want to delete has one child

 * 1) Directly connect the child of the node that you want to delete, to the parent of the node that you want to delete.



For example, to delete 90...


 * Delete 90, then make 100 the child node of 50.

Case 3: The node you want to delete has two children
One non-standard way, is to rotate the node into a chosen subtree, and attempt to delete the key again from that subtree, recursively, until Case 1 or Case 2 occurs. This could unbalance a tree, so randomly choosing whether to right or left rotate may help.

The standard way is to pick either the left or right child, say the right, then get the right's leftmost descendent by following left ,starting from the right child, until the next left is null. Then remove this leftmost descendant of the right child, replacing it with its right sub-tree ( it has a left child of null). Then use the contents of this former leftmost descendant of the right child, as replacement for the key and value of the node being deleted, so that its values now are in the deleted node, the parent of the right child. This still maintains the key ordering for all nodes. Example java code is below in the treap example code.

The following examples use the standard algorithm, that is, the successor is the left-most node in the right subtree of the node to be deleted.



For example, to delete 30


 * 1) The right node of the node which is being deleted is 40.
 * 2) (From now on, we continually go to the left node until there isn't another one...) The first left node of 40, is 35.
 * 3) 35 has no left node, therefore 35 is the successor!
 * 4) 35 replaces 30, at the original right node, and the node with 35 is deleted, replacing it with the right sub-tree, which has the root node 37.

Case 1 of two-children case: The successor is the right child of the node being deleted

 * 1) Directly move the child to the right of the node being deleted into the position of the node being deleted.
 * 2) As the new node has no left children, you can connect the deleted node's left subtree's root as it's left child.



For example, to delete 30


 * 1) replace the contents to be deleted (30), with the successor's contents( 40).
 * 2) delete the successor node (contents 40), replacing it with its right subtree (head contents 45).

Case 2 of two-children case: The successor isn't the right child of the node being deleted
This is best shown with an example



To delete 30...


 * 1) Replace the contents to be deleted (30) with the successor's contents (35).
 * 2) replace the successor (35) with it's right subtree (37). There is no left subtree because the successor is leftmost.

Red-Black trees
A red-black tree is a self-balancing tree structure that applies a color to each of its nodes. The structure of a red-black tree must adhere to a set of rules which dictate how nodes of a certain color can be arranged. The application of these rules is performed when the tree is modified in some way, causing the rotation and recolouring of certain nodes when a new node is inserted or an old node is deleted. This keeps the red-black tree balanced, guaranteeing a search complexity of O(log n).

The rules that a red-black tree must adhere to are as follows:


 * 1) Each node must be either red or black.
 * 2) The root is always black.
 * 3) All leaves within the tree are black (leaves do not contain data and can be modelled as null or nil references in most programming languages).
 * 4) Every red node must have two black child nodes.
 * 5) Every path from a given node to any of its descendant leaves must contain the same number of black nodes.

A red-black tree can be modelled as 2-3-4 tree, which is a sub-class of B tree (below). A black node with one red node can be seen as linked together as a 3-node, and a black node with 2 red child nodes can be seen as a 4-node.

4-nodes are split, producing a two node, and the middle node made red, which turns a parent of the middle node which has no red child from a 2-node to a 3-node, and turns a parent with one red child into a 4-node (but this doesn't occur with always left red nodes).

A in-line arrangement of two red nodes, is rotated into a parent with two red children, a 4-node, which is later split, as described before.

A right rotate     'split 4-node'  | red red / \ -->    B      --->  B   B        red / \ red        / \ red / \       C  A        C  A  C  D         /          / D         D

An optimization mentioned by Sedgewick is that all right inserted red nodes are left rotated to become left red nodes, so that only inline left red nodes ever have to be rotated right before splitting. AA-trees (above) by Arne Anderson, described in a paper in 1993 , seem an earlier exposition of the simplification, however he suggested right-leaning 'red marking' instead of left leaning , as suggested by Sedgewick, but AA trees seem to have precedence over left leaning red black trees. It would be quite a shock if the Linux CFS scheduler was described in the future as 'AA based'.

In summary, red-black trees are a way of detecting two insertions into the same side, and levelling out the tree before things get worse. Two left sided insertions will be rotated, and the two right sided insertions, would look like two left sided insertions after left rotation to remove right leaning red nodes. Two balanced insertions for the same parent could result in a 4-node split without rotation, so the question arises as to whether a red black tree could be attacked with serial insertions of one sided triads of a < P < b, and then the next triad's P' < a.

Python illustrative code follows

Summary

 * Whereas binary trees have nodes that have two children, with the left child and all of its descendants less than the "value" of the node, and the right child and all of its children more than the "value" of the node, a B-tree is a generalization of this.
 * The generalization is that instead of one value, the node has a list of values, and the list is of size n ( n > 2 ). n is chosen to optimize storage, so that a node corresponds in size to a block for instance. This is in the days before ssd drives, but searching binary nodes stored on ssd ram would still be slower than searching ssd ram for a block of values, loading into normal ram and cpu cache, and searching the loaded list.
 * At the start of the list, the left child of the first element of the list has a value less than the first element, and so do all its children. To the right of the first element, is a child which has values more than the first element's value, as do all of its children, but also less than the value of the second element. Induction can be used , and this holds so for the child between element 1 and 2, 2 and 3, ... so on until n-1 and nth node.
 * To insert into a non-full B tree node, is to do a insertion into a sorted list.
 * In a B+ tree, insertions can only be done in leaf nodes, and non-leaf nodes hold copies of a demarcating value between adjacent child nodes e.g. the left most value of an element's right child's list of nodes.
 * Whenever a list becomes full e.g. there are n nodes, the node is "split", and this means making two new nodes, and passing the demarcating value upto the parent.

B Trees were described originally as generalizations of binary search trees, where a binary tree is a 2-node B-Tree, the 2 standing for two children, with 2-1 = 1 key separating the 2 children. Hence a 3-node has 2 values separating 3 children, and a N node has N children separated by N-1 keys.

A classical B-Tree can have N-node internal nodes, and empty 2-nodes as leaf nodes, or more conveniently, the children can either be a value or a pointer to the next N-node, so it is a union.

The main idea with B-trees is that one starts with a root N-node, which is able to hold N-1 entries, but on the Nth entry, the number of keys for the node is exhausted, and the node can be split into two half sized N/2 sized N nodes, separated by a single key K, which is equal to the right node's leftmost key, so any entry with key K2 equal or greater than K goes in the right node, and anything less than K goes in the left. When the root node is split, a new root node is created with one key, and a left child and a right child. Since there are N children but only N-1 entries, the leftmost child is stored as a separate pointer. If the leftmost pointer splits, then the left half becomes the new leftmost pointer, and the right half and separating key is inserted into the front of the entries.

An alternative is the B+ tree which is the most commonly used in database systems, because only values are stored in leaf nodes, whereas internal nodes only store keys and pointers to other nodes, putting a limit on the size of the datum value as the size of a pointer. This often allows internal nodes with more entries able to fit a certain block size, e.g. 4K is a common physical disc block size. Hence, if a B+ tree internal node is aligned to a physical disc block, then the main rate limiting factor of reading a block of a large index from disc because it isn't cached in a memory list of blocks is reduced to one block read.

A B+ tree has bigger internal nodes, so is wider and shorter in theory than an equivalent B tree which must fit all nodes within a given physical block size, hence overall it is a faster index due to greater fan out and less height to reach keys on average.

Apparently, this fan out is so important, compression can also be applied to the blocks to increase the number of entries fitting within a given underlying layer's block size (the underlying layer is often a filesystem block).

Most database systems use the B+ tree algorithm, including postgresql, mysql, derbydb, firebird, many Xbase index types, etc.

Many filesystems also use a B+ tree to manage their block layout (e.g. xfs, NTFS, etc).

Transwiki has a java implementation of a B+ Tree which uses traditional arrays as key list and value list.

Below is an example of a B Tree with test driver, and a B+ tree with a test driver. The memory / disc management is not included, but a usable hacked example can be found at /Hashing Memory Checking Example/.

This B+ tree implementation was written out of the B Tree, and the difference from the transwiki B+ tree is that it tries to use the semantics of SortedMap and SortedSet already present in the standard Java collections library.

Hence, the flat leaf block list of this B+ implementation can't contain blocks that don't contain any data, because the ordering depends on the first key of the entries, so a leaf block needs to be created with its first entry.

A B+ tree java example
Experiments include timing the run ( e.g. time java -cp . btreemap.BPlusTreeTest1 ), using an external blocksize + 1 sized leaf block size, so that this is basically the underlying entries TreeMap only, vs , say, 400 internal node size, and 200 external node size. Other experiments include using a SkipListMap instead of a TreeMap.

Treaps
The invariant in a binary tree is that left is less than right with respect to insertion keys. e.g. for a key with order, ord(L) < ord(R). This doesn't dictate the relationship of nodes however, and left and right rotation does not affect the above. Therefore another order can be imposed. If the order is randomised, it is likely to counteract any skewness of a plain binary tree e.g. when inserting an already sorted input in order.

Below is a java example implementation, including a plain binary tree delete code example.