Structured Query Language/Recursions

Sometimes the rows of one table are structured in such a way that they represent a hierarchy or a network within this table. Typical use cases are management structures, bill of materials (a machine consists of several smaller machines, …), or network structures (e.g.: flight plans).

To retrieve particular rows and all rows that correlate to them, one can use set operations in combination with subqueries to merge them together to one result set. But this technique is limited as one must exactly know the number of levels. Apart from the fact that the number of levels changes from case to case, the subselect syntax differs from level to level. To overcome these restrictions, SQL offers a syntax to express queries in a recursive manner. They retrieve the rows of all affected levels, independent from their number.

Syntax
The SQL standard uses a special form of its, which is explained on the  previous page, to define recursive queries. The clause occurs before a SELECT, INSERT, UPDATE, or DELETE keyword and is part of the appropriate command.

Hint: The  (with or without the 'RECURSIVE' key word) is often referred to as a 'common table expression (CTE)'.

Hint: Oracle supports the syntax of the SQL standard since version 11.2. MySQL 8.0 supports the RECURSIVE keyword. Earlier MySQL versions do not support recursions at all and recommend procedural workarounds.

The evaluation sequence is as follows:
 * 1) The initial query to a real table or a view is executed and creates the start point for step 2.
 * 2) Usually, the repetitive query consists of a join between the real table or view and the result set build up so far. This step is repeated until no new rows are found.
 * 3) The result sets from step 1. and 2. are merged together.
 * 4) The final SELECT acts on the result of step 3.

Example Table
To demonstrate recursive queries, we define an example table. It holds information about persons and their ancestors. Because ancestors are always persons, everything is stored in the same table. father_id and mother_id acts as references to the rows where father's and mother's information is stored. The combination of father_id, mother_id and firstname acts as a criterion, which uniquely identifies rows according to those three values (we suppose that parents give their children different names).

Basic Queries
As a first example, we retrieve Mr. Karl Miller and all his descendants. To do so, we must retrieve his own row and define a rule, how to 'navigate' from level to level within the family tree.

You can use all language features of SQL to process the intermediate table further. (It isn't a real table, it is only an intermediate result with the structure of a table). For example, to count the number of descendants.

To demonstrate the problems in situations where no recursive SELECT is available, we show a syntax with subqueries.

Every level has its own syntax, e.g., to retrieve grandchildren, we need a subquery within a subquery.

As a second example, we traverse the hierarchy in the opposite direction: from a person to their patrilineal (male-line) ancestors. In comparison to the first example, two things change. The start point of the query is no longer Mr. Karl Miller, as he has no ancestor in our example table. And we have to change the join condition by swapping id and father_id.

Notice the Level
Sometimes we need to know to which level within the hierarchy or network a row belongs to. To display this level, we include a pseudo-column with an arbitrary name into the query. We choose the name hier_level (as level is a reserved word in the context of savepoints).

The level is now available, and we can use it as an additional condition, eg. for a restriction to the first two levels.

Create Paths
Sometimes we want to build a path from the starting point of the hierarchy or network to the actual row, eg. for a faceted classification like 1.5.3 or for a simple numbering of the visited nodes. This can be achieved in a similar way as the computing of the level. We need a pseudo-column with an arbitrary name and append actual values to those that have already been formed.

Depth First / Breadth First
There are two ways to traverse hierarchies and networks. You must decide which kind of nodes you want to process first: child nodes (nodes of the next level) or sibling nodes (nodes of the same level). The two methods are called depth first and breadth first. With the keywords  and   (the default) you can decide between the two variants.

The key words occur between the  and the. Since - as opposed to a tree in a programming language like JAVA or C++ or like an XML instance - rows of a table have no implicit order, you must define an order for the nodes within their level. This is done behind the  key word. After the  key word, define the name of an additional pseudo-column, where a numbering over all rows is stored automatically.

There are some notable remarks to the above query:
 * 1) In contrast to the other queries on this page (where we implicitly have used the default  ), the family tree is traversed in such a way that after every row its 'child' rows are processed. This is significant at level 1.
 * 2) If there is more than one row per level, the rows are ordered according to the   definition: firstname in this case.
 * 3) The rows have a sequence number: sequence_number in this case. You may use this number for any additional processing.

Exercises
Retrieve Chess Miller and all Chess's female ancestors.

Retrieve Chess Miller and all Chess's ancestors: male and female.

To make the situation a little bit more transparent add a number to the previous query which shows the actual level.

To make the situation absolutely transparent replace the level by some kind of path (child / parent / grandparent / ...).

Retrieve all grandchildren of Karl Miller.

Retrieve every person in the table family_tree and show its firstname and the firstname of its very first known ancestor in the male line.

a) How many descendants of Carl Miller are stored in the example table? b) Same question as before, but differentiated per level.