Algorithm Implementation/Trees/B+ tree

In computer science, a B+ tree is a type of tree data structure. It represents sorted data in a way that allows for efficient insertion and removal of elements. It is a dynamic, multilevel index with maximum and minimum bounds on the number of keys in each node.

A B+ tree is a variation on a B-tree. In a B+ tree, in contrast to a B tree, all data are saved in the leaves. Internal nodes contain only keys and tree pointers. All leaves are at the same lowest level. Leaf nodes are also linked together as a linked list to make range queries easy.

The maximum number of keys in a record is called the order of the B+ tree.

The minimum number of keys per record is 1/2 of the maximum number of keys. For example, if the order of a B+ tree is n, each node (except for the root) must have between n/2 and n keys.

The number of keys that may be indexed using a B+ tree is a function of the order of the tree and its height.

For a n-order B+ tree with a height of h:
 * maximum number of keys is $$n^h$$
 * minimum number of keys is $$2(n/2)^{h-1}.$$

The B+ tree was first described in the paper "Rudolf Bayer, Edward M. McCreight: Organization and Maintenance of Large Ordered Indices. Acta Informatica 1: 173-189 (1972)".

Sample implementation in C++
CAREFUL: THE FOLLOWING CODE EXAMPLES DO NOT IMPLEMENT A PROPER B+-TREE. While they are very similar to a B+-Tree, they do not fulfill the B+-Tree criteria (as the authors admit in some comments).

This code snippet has been tested under Linux on a 32-bit x86 computer. Deletion of keys has not been implemented yet. It can be done quite easily in a lazy way with an amortized cost of O(log n), by rebuilding the tree from scratch every time that there are as many deleted keys as non-deleted keys. The rebuilding can be done in O(n) time, so its amortized cost is only O(1). This approach, however, would not be appropriate for real time systems.

The implementation uses the Boost library to have compile-time assertions and efficient memory allocation. The latter could be done with the  instead, resulting in some performance penalty.

Violate the B+Tree definition during inner node split
After convert it into Java code, I realize the following code doing split insertInner violate the B+ Tree definition For example, what if N = 4, so after you split, we end up with 3 nodes, root node with 1 key, original node with 1 key, and the sibling node with 3 keys. So it (original node) violates the B+ Tree rules at least half full in the inner node. Look like we still need the canonical algorithm to make it right. Not shortcut here. So this algorithm is not strict B+ Tree, but it works. Anyway, I convert above C++ code to Java code shown below.

It doesn't really work because generic arrays cannot be created by casting from a newly created array of a concrete ancestor like Object.

ie. T[] a = new T[10] won't work. see the field declarations of LNode.