PostgreSQL/MVCC

In nearly all cases, PostgreSQL databases must support many clients, which want to add or change data, at the same time. This makes it necessary to protect concurrently running requests from each other - preferably without blocking them. Situations may occur where two clients want to change the same row at the same time or that one client wants to revoke (rollback) his changes while another client may still have tried to read the newest version.

Imagine an Online shop offering the last copy of an article. Two clients display the article at their user interface. After a while, but at the same time, both clients decide to put the article into their shopping cart or even to buy it. Both have seen the article, but only one can be allowed to buy it. The database must enforce an order of the requests, permit the write access to one of them, block the other from writing, and inform the blocked client that the data has been changed by a different process and shall be re-read.

PostgreSQL implements a sophisticated technique to handle concurrent accesses that avoids locking: Multiversion Concurrency Control (MVCC). Instead of locking a row, the MVCC technique creates a new version of that row when a data change takes place. "The main advantage of using the MVCC ... rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading. PostgreSQL maintains this guarantee even when providing the strictest level of transaction isolation through the use of ... Serializable Snapshot Isolation (SSI)".

The implementation of MVCC is based on transaction IDs (XID). Every transaction in a cluster gets a unique sequential number as its ID. Every,  , or   command stores the XID in   or   within the affected rows. ,, and some more are system columns contained in every row. Both are not visible with the usual  command. But you can read them with commands like. The column  contains the XID of the transaction which has created this version of the row and   contains the XID of the transaction which has deleted this version, or zero if the version is not deleted.

So, what's going on in detail when write accesses take place? The following graphic shows details concerning,  , and the regular application data.



An  command creates the very first version of a row. Besides its application data 'x', this version contains the ID of the creating transaction 123 in  and 0 in. indicates that the version exists since transaction 123 and the value 0 in indicates that it is currently not deleted.

Somewhat later, transaction 135 executes an  of this row by changing the application data from 'x' to 'y'. According to the MVCC principles, the data in the old version of the row is not changed. The value 'x' remains as it was before. Only  changes to 135. Now, this version is treated as valid exclusively for transactions with XIDs from 123 to 134. In addition to preserve the data in the old version, the  creates a new version of the complete row with its XID in , 0 in  , and 'y' in the application data (plus all other application data from the old version). This new row version is visible to all future transactions. (Internally, an  command acts as a   command followed by an   command.)

All subsequent  commands behave in the same way as the first one: they put their XID in   of the current version, create a new version with their XID in   and 0 in.

Finally, a row may be deleted by a  command. Even in this case, all versions of the row including the newest one remain in the database - nothing is thrown away. Only  of the last version is set to the XID of the   transaction, which indicates that it is only visible to transactions with older XIDs - in this example from 142 to 820.

In summary, the MVCC technology creates more and more versions of the same row in the table's heap file and leaves them there, even after a  command. Only the youngest version is relevant for all future transactions. But the system must also preserve some of the older ones for a short time because they could still be requested by transactions that had started before the deleting transaction and hence have a smaller XID. Over time, also the older ones goes out of scope for ALL transactions and therefore become ultimately unnecessary. Nevertheless, they do exist physically on the disk and occupy space. They are called dead rows and are part of the so-called bloat.

Please keep in mind:


 * and  indicate the range in which row versions are visible for transactions. This range doesn't imply any direct temporal meaning. The sequence of XIDs reflects only the sequence of transactions' begin events.
 * Internally, an  command acts in the same way as a   command followed by an   command.
 * Nothing is removed - with the consequence that the database occupies more and more disk space. It is obvious that this behavior has to be corrected in some way. The next chapter explains how  and   fulfill this task.

So far this is only a raw description of the principles of MVCC. The implementation considers more problems, e.g.:
 * Changes may be revoked by a  command.
 * After some time the sequence of XIDs may start from zero (wrap-around). In this case  can be smaller than.

Note
XIDs are sequences (with a reserved value to handle wrap-around in pre-9.4 PostgreSQL versions). PostgreSQL knows some configuration parameters concerning transactions and their XIDs with names like xxx_age, e.g.: vacuum_freeze_min_age. For such parameters, the 'age' doesn't specify a period of time but represents a certain number of transactions, e.g., 100 millions.