Design of Main Memory Database System/Concurrency

Table of Contents &mdash;

Chapter 8: Concurrency Management

8.1 Overview
Concurrency is defined as the ability of multiple processes and threads to access and change the data records at the same time. Lower the contention to access and modify data with more users, better is the concurrency and vice versa. A process that access data prevents other process to change the data. This reduces the concurrency. A process that modifies data prevents other process to access or change the data. This reduces the concurrency.

In general, database systems uses two approaches to manage concurrent data access; pessimistic and optimistic. Conflicts cannot be avoided in both the models, it differs only in when the conflicts are dealt.

8.1.1 Pessimistic Concurrency
Pessimistic concurrency systems assume that conflict will occur and it avoids conflicts by acquiring locks on data that is being read or modified, so that no other process can modify that data. In this model, readers block writers and writers block readers.

8.1.2 Optimistic Concurrency
Optimistic concurrency systems assume that transactions are unlikely to modify data that another transaction is modifying. This is implemented by using versioning technique. This allows readers to see the state of the data before the modification occurs as the system maintains previous version of the data record before it actually attempts to change it. In this model readers do not block writers and writers do not block readers. However writers will block writes which will lead to conflicts.

8.2 Concurrency problem
There are some problems that any concurrency control mechanism must address. There are three ways in which things can go wrong. They are lost update problem, uncommitted dependency and Inconsistent Analysis. In all three, individual transactions are right, but when they are interleaved it may produce wrong results.

8.2.1 Lost Update Problem
TODO- Define this

TODO- Draw diagram

Transaction-1 reads tuple1 at time t1 Transaction-2 reads tuple1 at time t2 Transaction-1 updates tuple1 at time t3 Transaction-2 updates tuple1 at time t4

Transaction-1 update on tuple1 at time t3 is lost as Transaction-2 overwrites the update made by Transaction-1 on tuple1 without checking whether it has changed.

8.2.2 Uncommitted dependency
This occurs when one transaction is allowed to access or update a tuple that has been updated by another transaction but not yet committed by that transaction.

TODO- Draw diagram

Transaction-1 updates tuple1 at time t1 Transaction-2 reads tuple1 at time t2 Transaction-1 rolls back the transaction at time t3 In the above sequence, Transaction-2 sees an uncommitted change at time t2, which is undone at time t3. Transaction-2 is operating with wrong value seen at time t2. As a result, transaction-2 might produce incorrect result.

If transaction-2 updates instead of reading tuple t1 at t2, the situation is even worse, it will loose its update on tuple t1, once transaction-1 rolls back.

8.2.3 Inconsistent Analysis
If one transaction is calculating an aggregate summary function on records, while other transactions are updating some of the records involved in the aggregation, the aggregate function may calculate with some values before they are updated and some values after they are updated.

TODO- draw diagram

For example, suppose that transaction-1 is calculating the total number of reservations on all the theatres for particular day; meanwhile transaction-2 is reserving 5 seats on that day, then results of transaction-1 will be off by 5 because transaction-1 reads the value of X after 5 seats have been subtracted from it.

8.3 Locking
All the above three problems can be solved by using concurrency control technique called locking. The basic idea is to deny access to the data record for all transactions, when a transaction is working on it.

A lock is a variable associated with a data item that describes the status of the data item. Generally there is one lock for each data item(record) in DBMS. Locks are used to provide synchronous access to the data items by concurrent transactions. There are two types of locks supported by Unix kind of operating systems pthread mutexes work well with multiple threads and semaphores work well with multiple threads as well as multiple process. Pthread mutexes are called binary locks as they have two states (lockState); locked and unlocked. Semaphores can be used as binary locks as well as counting locks. In our case, binary locks will be used to provide synchronized concurrent access to the data items. Two operations, lockItem and unlockItem are used with binary locking. A transaction requests access to data item X by first issuing lockItem(X) operation. If lockState(X) is 1, then the transaction is forced to wait till lockState(X) becomes 0. If it is zero, then lockState(X) is set to 1, and the transaction is allowed to access data item X. When the transaction is through with using the data item, it issues an unlockItem(X) operation, which sets lockState(X) to zero, so that other transactions shall access X.
 * pthread mutexes
 * semaphores

Data access protocol for locking method is
 * A read transaction will acquire lock on the data before it reads the data.
 * A write transaction will acquire lock on the data before it writes the data.
 * If lock request is denied, then the transaction goes to wait state and tries for the lock periodically until the lock is released by the transaction that acquired it.

In the above locking model,
 * Readers block readers and writers
 * Writers block readers and writers

Concurrency shall be slightly improved on the above model by making readers not blocking other readers, as it will not lead to any inconsistencies. This shall be achieved by introducing another type of lock, which shall be shared by all readers. These locks are called as Shared Locks. Another type of lock, exclusive locks are obtained by writers to block all readers and writers from accessing the data. The data access protocol for this locking method is
 * A read transaction will acquire Shared Lock on the data before it reads the data.
 * A write transaction will acquire Exclusive Lock on the data before it writes the data.
 * Lock request is denied for read operation if another transaction has exclusive lock on the data item.
 * Lock request is denied for write operation if another transaction has read or exclusive lock on the data item.
 * If lock request is denied, then the transaction goes to wait state and tries for the lock periodically until the lock is released by the transaction that acquired it.

In the above locking model,
 * Readers block writers and allow other readers
 * Writers block readers and writers

The above rules shall be summarized as lock compatibility matrix

Shared	Exclusive	No Lock Shared	Yes	No	Yes Exclusive	No	No	Yes No Lock	Yes	Yes	Yes TODO::Above lock compatibility matrix in image

Yes -> compatible, no conflict, lock request will be granted.

No-> not compatible, there is a conflict, so lock request should be denied.

Concurrency problems with locking protocol


 * Lost update problem – transaction-1 waits for exclusive lock forever from time t3 as shared lock is acquired by transaction-1 and transaction-2. Transaction-2 waits for exclusive lock forever from time t4 as shared lock is acquired by transaction-1 and transaction-2. Our lost update problem is solved, but a new problem has occurred. It is called “DeadLock”. We will look into its details later.
 * Uncommitted dependency - Transaction-2 waits for lock when it tries to read tuple1 at time t2 as it is exclusively locked by transaction-1. It waits till transaction-1 either commits or rollbacks. Locking avoids uncommitted dependency issue.
 * Inconsistent Analysis – Transcation-1 waits till transaction-2 releases the exclusive lock on the data record X and then computes the aggregation giving correct results.

8.3.1 Serializable transactions
A given set of transactions is considered to be serializable, if it produces the same result, as though these transactions are executed serially one after the other. The concept of serializability was first introduced by Eswaran, Gray proved the two-phase locking protocol, which is briefly described as: If all transactions obey the “two-phase locking protocol”, then all possible interleaved schedules are serializable.
 * Individual transactions are correct as they transform a correct state of the database to another correct state
 * Executing transaction one at a time in any serial order is also correct, as individual transactions are independent of each other.
 * An interleaved execution is correct, if it is equivalent to serial execution or if it is serializable.

8.3.2 Two Phase Locking Protocol
After releasing a lock, a transaction must never go on to acquire any more locks. A transaction that obeys this protocol, thus has two phases, a lock acquisition or “Growing” phase and a lock releasing or “Shrinking” phase.

Two phase locking may limit the amount of concurrency that can occur in a schedule. This is because a transaction may not be able to release lock on data item after it is through with it. This is the price for guaranteeing serializability of all schedules without having to check the schedules themselves.

There are number of variations of two phase locking (2PL). The technique we described above is known as 2 phase locking.

8.3.2.1 Conservative 2 PL
This requires a transaction to lock all the item it accesses before the transaction starts, by declaring read-set and write-set. Read-set of a transaction is the set of all items that the transaction reads, and the write-set is the set of all items that it writes.

If any of the items in read-set or write-set cannot be locked, it waits for it to be released by other transaction and after acquiring all locks at once, it starts the transaction. Conservative 2 phase locking is deadlock free protocol. However, it is difficult to use in practice because of the need to declare the read-set and write-set, which is not possible practically in most of the situations.

Growing phase is before the transaction starts and shrinking phase starts as soon as the transaction ends in case of conservative 2 phase locking.

8.3.2.2 Strict 2 PL
The most popular variation of 2-phase locking is strict 2-phase locking. A transaction does not release any of its exclusive locks till the transaction either commits or rollbacks. This guarantees strict schedules, as no other transactions can read or write an item that is written by this transaction unless it is committed. Growing phase starts as soon as the transaction starts and shrinking on write locks happens during either transaction commits or rollbacks.

8.3.2.3 Rigorous 2 PL
This is more restrictive version of strict 2-phase locking protocol. A transaction does not release any of its shared and exclusive locks till the transaction either commits or rollbacks. This is the easiest 2-phase locking to implement, but gives less concurrency for reads.

Growing phase starts as soon as the transaction starts and shrinking happens during either transaction commit or rollback.

8.3.3 Lock Starvation
Lock starvation occurs when a transaction cannot proceed for an indefinite period of time while other transactions in the system continue to run normally. This can occur due to unfair lock scheduling algorithms which implements priority based locking. One solution for starvation is to have fair lock waiting scheme, such as first in first out (FIFO) queue.

Starvation can also occur when the deadlock algorithm, selects the same transaction repeatedly for abort, thereby never allowing it to finish. The algorithms shall be modified to use higher priorities for transactions that have been aborted multiple times to avoid this problem.

8.3.4 Dead Lock
Deadlock occurs when each transaction in a set of two or more transactions wait for some resource that is locked by some other transaction in the same set.

For example transaction T1 acquires Resource R1 and transaction T2 acquires resource R2. After this if T1 waits for R2 and T2 waits for R1. Both will never get the lock, and this situation is termed as deadlock.

TODO Diagram with respect to time

8.3.4.1 Dead Lock Prevention
There are many prevention protocols, but most of them are practically not possible in case of DBMS. They are conservative 2 phase locking, ordering data record locking, no waiting, cautious waiting, Wait-die and Wound-wait.

Conservative two-phase locking protocol is deadlock prevention protocol, in which all locks are acquired before the transaction works on the data records.

Ordering of data record locking will also prevent deadlocks. A transaction, which works on several data records, should obtain locks in pre-determined order always. This requires the programmer or DBMS aware of chosen order of the data record locks. This is also not practical to implement in database systems.

No Waiting Algorithm If a transaction is unable to obtain a lock, it is immediately aborted and then restarted after a certain time delay without checking whether a deadlock will actually occur or not. This can cause transactions to abort and restart needlessly. Cautious Waiting Algorithm

This is proposed to avoid needless restart in case of no waiting algorithm, If transaction T1 tries to lock an data record R1, but is not able to do so because R1 is locked by some other transaction T2 with a conflicting lock. If T2 is not blocked on some other locked data record, then T1 is blocked and allowed to wait; otherwise abort T1.

Wait-Die and Wound-Wait Algorithm The other two techniques, wait-die and wound-wait use transaction timestamps as basis to determine what to do in case of deadlocks. Transaction timestamp in a unique identifier assigned to each transaction. These timestamps are generally running counter which gets incremented for every transaction started. If transaction T1, starts before transaction T2, then TS(T1) < TS(T2)

Suppose that transaction T1 tries to lock data record R1, but is not able to lock because R1 is locked by some other transaction T2 with a conflicting lock. Rules followed by these schemes are as follows

Wait-Die – If TS (T1) < TS (T2), then T1 is allowed to wait, otherwise abort T1 and restart it later with the same timestamp. Wound-Wait – If TS (T1) < TS (T2), then abort T1 and restart it later with same timestamp; otherwise T1 is allowed to wait

In wait-die, older transaction is allowed to wait on younger transaction, whereas a younger transaction requesting lock on record R1 held by an older transaction is aborted and restarted. The wound-wait approach does the opposite; a younger transaction is allowed to wait on an older one, whereas an older transaction requesting lock on record R1 held by an younger transaction preempts the younger transaction by aborting it. Both schemes end up aborting the younger of the two transactions that may be involved in a deadlock. In wait-die, transactions wait only on younger transactions. In wound-wait, transactions wait only on older transactions. So no cycle is created in both of these schemes avoiding deadlocks.

8.3.4.2 Dead Lock Detection
Deadlock detection is more practical approach than the deadlock prevention techniques. This first checks whether deadlock state actually exist in the system before taking any actions.

A simple way to detect a state of deadlock is for the system to construct and maintain a “wait-for” graph TODO-Diagram and explanation

If the system is in a state of deadlock, some of the transactions causing deadlocks must be aborted. Either application or DBMS should select one of the transactions involved in deadlock for rollback to get the system out of deadlock situation. This selection algorithm should consider avoiding transactions that are running for long time and transactions that have performed many updates. The best transactions to be aborted are the SELECT or read only transactions.

8.3.4.3 Timeouts
The simplest solution for handling deadlocks is timeouts. In this method, transactions that wait for longer than the system defined timeout period, are assumed to be in deadlock situation and are aborted.

8.4. Isolation Levels
TODO

8.5 Lock Granularity
The size of the data item is often called the data item granularity. Fine granularity refers to small data item sizes, whereas coarse granularity refers to large item sizes.

Larger the data item size, lower the degree of concurrency. For example if the data item size is a ‘Table’ denoted by Table1, a transaction T1 that needs to lock a record X must lock the whole table Table1 that contains record X because the lock is associated with the whole data item, Table1. If another transaction T2 wants to lock a different record Y of Table1, it is forced to wait till T1 releases the lock on Table1. If the data item size is single record, then transaction T2 would be able to proceed, because it would lock different data item.

Smaller the data item size, more the number of items in the database. Because every item is associated with a lock, the system will have a larger number of active locks. More lock and unlock operations will be performed, causing a higher overhead. In addition, more storage space is required for storing these locks.

For large transactions, which access many records, coarse granularity should be used and for small transactions, which access small number of records, fine granularity should be used.

8.5.1 Granularity Levels in DBMS
Granularity levels are listed below ordered from Coarse to fine granularity
 * Database
 * Table
 * Disk Block or Memory Page
 * Record
 * Record Field

Since the best granularity size depends on the given transaction, DBMS should support multiple level so granularity and allows the transaction to pick any level it wants.

8.5.2 Intent Locks
Let us take an example database DB1, has one table Table1, having 2 pages P1 and P2. P1 has 10 records R1 to R10, and P2 has 10 records R11 to R20.

TODO::Draw tree structure denoting above.

Scenario 1:

Transaction T1 wants to update all records in Table1. It will request for exclusive lock on Table1. This is beneficial for T1 than acquiring 20 locks for each data record. Now suppose, another transaction T2 wants to read record R5 from page P1, then T2 would request a shared record level lock on R5. DBMS will now check for the compatibility of the requested lock with already held locks. One way to verify this is to traverse the tree from leaf R5 to root DB1 and check for conflicting locks.

Scenario 2:

Transaction T1 wants to read record R5 from page P1, and then T2 would request a shared record level lock on R5. Now suppose, another transaction T2 wants to update all records in Table1, so it will request exclusive lock on Table1. DBMS will now check for the compatibility of the requested lock with already held locks. For this it needs to check all locks at page level and record level to ensure that there are not conflicting locks.

For both the above scenarios, traversal-based lock conflict detection is very inefficient and would defeat the purpose of having multiple granularity locking.

New types of locks are introduced to make the multiple granularity locking efficient. The idea behind intention locks is for a transaction to indicate, along the path from the root to the desired node, what type of lock it will require from one of the node’s descendants.

There are three types of intension locks.
 * Intention Shared (IS)
 * Intention Exclusive (IX)
 * Shared Intention Exclusive (SIX)

Intention –Shared Locks

Indicates that a shared lock will be requested on some descendant node

Intention –Exclusive Locks

Indicates that a exclusive lock will be requested on some descendant node

Intention –Shared Locks

Indicates that this node is locked in shared mode and exclusive lock will be requested on some descendant node

Compatibility Table

Mode	IS	IX	S	SIX	X IS	Yes	Yes	Yes	Yes	No IX	Yes	Yes	No	No	No S	Yes	No	Yes	No	No SIX	Yes	No	No	No	No X	No	No	No	No	No

TODO::diagram for above lock compatibility table

Locking protocol
 * 1) 	The root of the tree must be locked first
 * 2) 	A node can be locked by transaction in S or IS mode only if the parent node is already locked by transaction in either IS or IX mode
 * 3) 	A node can be locked by transaction in X, IX or SIX modes only if the parent of the node already locked by transaction in either IX or SIX modes
 * 4) 	A transaction can unlock a node, only if none of the children of node are currently locked by transaction.
 * 5) 	Lock compatibility and 2 phase locking should be adhered.

TODO::Example illustrating above protocol and compatibility table: Refer 445 of Elmasri

8.6 Index and Predicate Locking
One solution to phantom record problem is to use index locking.

A more general technique, called predicate locking would lock access to all records that satisfy a predicate or where condition. Predicate locks have proved to be difficult to implement effectively.

8.7 Timestamp based concurrency control (TODO:Rephrase whole section)
There is another concurrency control technique based on timestamp ordering, which avoids use of locks. As it avoids locks, deadlocks do not occur in this concurrency control mechanism.

8.7.1 Timestamps
Timestamp is a unique identifier created by DBMS to identify a transaction. Typically, timestamp values are assigned in the order in which the transactions are submitted to the system. Generally it is the transaction start time and referred as TS (T). These unique identifiers shall be implemented using simple counters, which gets incremented when a transaction is started. As is has a finite maximum value, algorithm should take care of resetting it when it reaches maximum.

8.7.2 Basic Timestamp ordering
Each data item X, has two timestamps


 * ReadTS(X) – The read timestamp of data item X; this is the largest timestamp among all the timestamps of the transactions that have successfully read the item X.
 * WriteTS(X) – The write timestamp of data item X; this is the largest timestamp among all the timestamps of the transactions that have successfully modified the item X.

Whenever some transaction T tries to issue readItem(X) or writeItem(X), the algorithm should compare the timestamp of T with ReadTS(X) and WriteTS(X) to ensure that the timestamp order of the transaction execution is not violated. If this order is violated, then transaction T is aborted and resubmitted to the system as a new transaction with a new timestamp. If T is aborted, then any transaction T1 that may have used a value written by T must also be aborted. Similarly any transaction T2 that may have used a value written by T1 must also be aborted and so on. This effect is known as cascading rollback and is one of the biggest problems associated with this scheme.

The basic timestamp-ordering algorithm is summarized below

Transaction T issues writeItem(X) operation: If readTS(X) > TS(T), or writeTS(T) > TS(T), then abort T, else execute writeItem(X) of T and set writeTS(X) to TS(T)

Transaction T issues readItem(X) operation: If writeTS(T) > TS(T), then abort T, else execute readItem(X) of T and set readTS(X) to largest of TS(T) and current readTS(X)

Whenever the basic timestamp ordering algorithm, detects two conflicting operations that occur in the incorrect order, it rejects the later of the two operations by aborting the transaction that issued it. The schedules produced by this algorithm are guaranteed to be conflict serializable, like the 2 phase locking protocol.

8.7.3 Strict Timestamp ordering
Strict Timestamp ordering is a variation of basic timestamp ordering that ensures that the schedules are recoverable and conflict serializable. In this variation, a transaction T, that issues a readItem(X) or writeItem(X) such that TS (T) > writeTS(X) has its read or write operation delayed until the transaction T1 that wrote the value of X has committed or aborted. To implement this algorithm, locking is required. This algorithm does not cause deadlock, since T waits for T1 only if TS(T) > TS(T1).

TODO::refer white paper and add more content

8.8 Multi Version concurrency control
TODO

8.9 Optimistic concurrency control
In both concurrency control techniques, locking and timestamp ordering, certain checks are made before a transaction operates on a data item. In locking, check is done to determine whether the item being accessed is locked. In timestamp ordering, the transaction timestamp is checked against the read and write timestamps of the data item. This imposes an overhead to the transaction execution. In optimistic concurrency control techniques, no checking is done while the transaction is executing. In this scheme, updates in the transaction are not applied directly to the data items until the transaction reaches its end. During transaction execution, all updates are applied to local copies of the data items, which are kept on per transaction basis. At the end of the transaction execution, a validation phase checks whether any of the transaction’s updates violate serializability. If it is not violated, then the transaction is committed and the database is updated from the local copies, otherwise the transaction is aborted and then restarted later. There are three phases in this protocol

This protocol suits well incase of minimal interference between transaction on data items. If the interference is more, then transactions will be restarted often. This technique is called ‘optimistic’ because they assume that little interference will occur and hence that there is no need to do checking during transaction execution.
 * Read Phase – A transaction can read values of committed data items from the database. However updates are applied only to local copies of the data items kept in transaction workspace.
 * Validation Phase - Checking is performed to ensure that serializability will not be violated if the transaction updates are applied to the database.
 * Write Phase – Transaction updates are applied to the database if the validation phase says that it is serializable. Otherwise the updates are discarded and the transaction is restarted.

TODO::refer white paper and add more content

8.10 Architecture for lock manager
Concurrency control in traditional database systems aims to maintain database consistency. Concurrency control in MMDB is difficult due to the conflicting requirements of satisfying timing constraints and maintaining data consistency.