Structured Query Language/Foreign Key

Foreign Keys (FK) define a directed reference from one table (the child) to another table (the parent). This reference acts as long as the involved columns of the two tables contain identical values. It couples one row of the child table to a single row of the parent table - a row of the parent table may be coupled by many rows of the child table.

E.g.: You may have the table department with column id and the table employee with column dept_id. If you want to assign an employee to a distinct department, you store the department-id in its column dept_id. This can be done in every case-independent from any Foreign Key definition. But in such cases people often have two additional requirements: First, employees shall only be assigned to departments which really exist. Second, as long as employees are assigned to a distinct department, it shall be impossible to delete this department. The main purpose of Foreign Keys is to guarantee these two requirements.

In other words: Foreign Keys guarantee that no orphans will arise.

Foreign Key vs. Join
Within RDBMs, identical values are used to link rows of different - and sometimes of the same - table together. Because this linking works on the basis of values and not of any link or special reference, it has no direction. In general, we call this technique a JOIN. Foreign Keys have a very similar concept because they also link rows with identical values together. But there are important differences:


 * Foreign Keys have a direction. It is important to know which one of the two affected tables is the child table and which one is the parent table.
 * Joins must be expressed within every DML statement, which is interested in this join (with the exception of views). In contrast, Foreign Keys are part of table definitions. All DML commands bear them in mind without expressing them within a DML statement.

Syntax
Rules:
 * FK-constraints can be defined during table definition (CREATE TABLE) or afterward (ALTER TABLE). On this page, we focus on the CREATE TABLE statement. The syntax of the ALTER TABLE statement is very similar.
 * FK-constraints belong to the child table definition.
 * Despite an existing FK-constraint, it is possible that rows of the child table don't belong to any parent row. This occurs if the column value of the child row is NULL. If you want to avoid such situations, define the column as 'NOT NULL'.
 * Although the FK-constraints belong to the child table, they also have consequences for the parent table such that rows of the parent table, which have existing rows in the child table, can not be deleted.
 * The denoted parent table must exist.
 * The denoted column of the parent table must be its Primary Key or a column, which is UNIQUE.
 * It is perfectly all right to use the same table as parent and child table within one FK-constraint, see: Exercises.
 * One table may be subject to a lot of FK-constraints.

Example
The example defines the tables department and employee. The Foreign Key definition of employee declares department as the parent table of employee.

This kind of modeling allows the representation of hierarchical tree structures. One or many child nodes (rows) belong to a single parent node (row). In the context of DBMS, this kind of association is called a 1:m relationship.

n:m Relationship
In the real world, there are more association types than 1:m relationships. Often there are so-called n:m relationships where objects (rows) belong to more than 1 other object (row). Thereby the meaning of parent/child tables gets lost. In our example database there is a table hobby and another table person. One person may pursue multiple hobbies. At the same time, multiple persons may pursue the same hobby. This can be designed by creating a third table between the two original tables. The third table holds the IDs of the first and second table. So one can decide which person pursues which hobby.

The technique to realize this n:m situation is the same as shown in the previous chapter with its 1:m association - it is only used twice. We define two Foreign Keys, which start from the 'table-in-the-middle' and refers to the two other tables. In a technical sense, we can say, that the 'table-in-the-middle' is the child table for the two parent tables person and hobby. person and hobby are at the same logical level.

ON DELETE / ON UPDATE
So far, we have assumed that rows of the parent table cannot be deleted if a row in the child table exists, which refers to this parent row. This is the default, but all in all, the SQL standard defines five options to handle this parent/child situation in various ways. The options extend the constraint definition. They are:
 * ON DELETE CASCADE: If a row of the parent table is deleted, then all matching rows in the referencing table are deleted.
 * ON DELETE SET NULL: If a row of the parent table is deleted, then all referencing columns in all matching rows of the child table are set to NULL.
 * ON DELETE SET DEFAULT: If a row of the parent table is deleted, then all referencing columns in all matching rows of the child table are set to the column’s default value.
 * ON DELETE RESTRICT: It is prohibited to delete a row of the parent table if that row has any matching rows in the child table. The point in time when checking occurs can be deferred until COMMIT.
 * ON DELETE NO ACTION (the default): It is prohibited to delete a row of the parent table if that row has any matching rows in the child table. This holds true in ALL cases, even if checking is deferred (see next chapter).

Analog to the ON DELETE option, there is an ON UPDATE option. It defines the same five options for the case of changing a column in the parent table, which is referred by the column of a child table.
 * ON UPDATE CASCADE: Any change to a referenced column in the parent table causes the same change to the corresponding referencing column in matching rows of the child table.
 * ON UPDATE SET NULL: Any change to a referenced column in the parent table causes the corresponding referencing column in matching rows of the child table to be set to null.
 * ON UPDATE SET DEFAULT: Any change to a referenced column in the referenced table causes the corresponding referencing column in matching rows of the referencing table to be set to its default value.
 * ON UPDATE RESTRICT: It is prohibited to change a row of the parent table if that row has any matching rows in the child table. The point in time when checking occurs can be deferred until COMMIT.
 * ON UPDATE NO ACTION (the default): It is prohibited to change a row of the parent table if that row has any matching rows in the child table. This holds true in ALL cases, even if checking is deferred (see next chapter).

If ON DELETE or ON UPDATE are not specified, the default action NO ACTION will occur. In some systems, the NO ACTION is implemented in the sense of the RESTRICT option.

An Example:

Hint 1: The concept of updating Primary Keys is controversial. Hint 2: Not all DBMS support all options.

IMMEDIATE / DEFERRED
There is an additional option to decide at what point in time the evaluation of the Foreign Key definition shall occur. The default behavior is to check it with each UPDATE and DELETE command. The second possibility is deferring the check until the end of the transaction, which is the COMMIT command. The purpose of this deferring is to put applications in the position to modify parent tables before child tables (which may be helpful if they utilize Hibernate).

To define this option, the constraint definition must be extended by the keywords [NOT] DEFERRABLE, which are pre- or postfixed by INITIALLY IMMEDIATE (the default) or INITIALLY DEFERRED to specify the initial state after the CREATE TABLE point in time.

Hint: MySQL does not support the DEFERRABLE option, but the Foreign Key checking can be activated and deactivated dynamically by 'SET foreign_key_checks = 0/1;'

The Chicken-Egg Problem
Sometimes applications run into cyclic dependencies: Table A contains a reference to table B and vice versa, e.g.: A table team contains the columns id, team_name and team_leader (which is an id to a player) and the table player contains the columns id, player_name and team_id.

So far, so bad. When the first team-row shall be inserted, the player-row is missed. When the player-row is inserted first, the team-row is missed.

As we have seen above, there is a DEFER option. Using this option, the FK-constraints must be defined such that they are not evaluated immediately with the INSERT commands. They shall be evaluated after all INSERTs at the COMMIT point in time.

Now we can insert data in any sequence (don't miss to deactivate AUTOCOMMIT).

DROP TABLE / TRUNCATE TABLE
Foreign Keys have implications to DROP TABLE and TRUNCATE TABLE commands. As long as a Foreign Key refers to a parent table, this table cannot be dropped (remove structure and data) or truncated (remove data only). This holds true even if there is no actual row referring any row in the parent table - the existence of the Foreign Key is sufficient to refuse DROP and TRUNCATE.

To use DROP or TRUNCATE, it is necessary to drop the constraint first.

Hint: Some implementations offer a DISABLE/ENABLE command to deactivate constraints temporarily.

Exercises
Is it possible that the parent table of a FK-constraint contains 1 row and the child table is empty?

Is it possible that the child table of a FK-constraint contains 1 row and the parent table is empty?

Create a table genealogy which stores information about people and their ancestors. The columns are: id, first_name, last_name, birth_name, father_id, mother_id.

Extend the table genealogy by two FK-contraints such that the columns 'father_id' and 'mother_id' refer to other rows of this table.

Insert some data into 'genealogy', e.g.: data from your personal family.