Algorithms/Randomization

As deterministic algorithms are driven to their limits when one tries to solve hard problems with them, a useful technique to speed up the computation is randomization. In randomized algorithms, the algorithm has access to a random source, which can be imagined as tossing coins during the computation. Depending on the outcome of the toss, the algorithm may split up its computation path.

There are two main types of randomized algorithms: Las Vegas algorithms and Monte-Carlo algorithms. In Las Vegas algorithms, the algorithm may use the randomness to speed up the computation, but the algorithm must always return the correct answer to the input. Monte-Carlo algorithms do not have the latter restriction, that is, they are allowed to give wrong return values. However, returning a wrong return value must have a small probability, otherwise that Monte-Carlo algorithm would not be of any use.

Many approximation algorithms use randomization.

Ordered Statistics
Before covering randomized techniques, we'll start with a deterministic problem that leads to a problem that utilizes randomization. Suppose you have an unsorted array of values and you want to find


 * the maximum value,
 * the minimum value, and
 * the median value.

In the immortal words of one of our former computer science professors, "How can you do?"

find-max
First, it's relatively straightforward to find the largest element:

// find-max -- returns the maximum element function find-max(array vals[1..n]): element let result := vals[1] for i from 2 to n: result := max(result, vals[i]) repeat return result end

An initial assignment of $$-\infty$$ to result would work as well, but this is a useless call to the max function since the first element compared gets set to result. By initializing result as such the function only requires n-1 comparisons. (Moreover, in languages capable of metaprogramming, the data type may not be strictly numerical and there might be no good way of assigning $$-\infty$$; using vals[1] is type-safe.)

A similar routine to find the minimum element can be done by calling the min function instead of the max function.

find-min-max
But now suppose you want to find the min and the max at the same time; here's one solution:

// find-min-max -- returns the minimum and maximum element of the given array function find-min-max(array vals): pair return pair {find-min(vals), find-max(vals)} end

Because find-max and find-min both make n-1 calls to the max or min functions (when vals has n elements), the total number of comparisons made in find-min-max is $$2n-2$$.

However, some redundant comparisons are being made. These redundancies can be removed by "weaving" together the min and max functions:

// find-min-max -- returns the minimum and maximum element of the given array function find-min-max(array vals[1..n]): pair let min := $$\infty$$ let max := $$-\infty$$ if n is odd: min := max := vals[1] vals := vals[2,..,n]         // we can now assume n is even n := n - 1 fi  for i:=1 to n by 2:             // consider pairs of values in vals if vals[i] < vals[i + 1]: let a := vals[i] let b := vals[i + 1] else: let a := vals[i + 1] let b := vals[i]           // invariant: a <= b fi    if a < min: min := a fi     if b > max: max := b fi   repeat return pair {min, max} end

Here, we only loop $$n/2$$ times instead of n times, but for each iteration we make three comparisons. Thus, the number of comparisons made is $$(3/2)n = 1.5n$$, resulting in a $$3/4$$ speed up over the original algorithm.

Only three comparisons need to be made instead of four because, by construction, it's always the case that $$a\le b$$. (In the first part of the "if", we actually know more specifically that $$a < b$$, but under the else part, we can only conclude that $$a\le b$$.) This property is utilized by noting that a doesn't need to be compared with the current maximum, because b is already greater than or equal to a, and similarly, b doesn't need to be compared with the current minimum, because a is already less than or equal to b.

In software engineering, there is a struggle between using libraries versus writing customized algorithms. In this case, the min and max functions weren't used in order to get a faster find-min-max routine. Such an operation would probably not be the bottleneck in a real-life program: however, if testing reveals the routine should be faster, such an approach should be taken. Typically, the solution that reuses libraries is better overall than writing customized solutions. Techniques such as open implementation and aspect-oriented programming may help manage this contention to get the best of both worlds, but regardless it's a useful distinction to recognize.

find-median
Finally, we need to consider how to find the median value. One approach is to sort the array then extract the median from the position vals[n/2]:

// find-median -- returns the median element of vals function find-median(array vals[1..n]): element assert (n > 0) sort(vals) return vals[n / 2] end

If our values are not numbers close enough in value (or otherwise cannot be sorted by a radix sort) the sort above is going to require $$O(n\log n)$$ steps.

However, it is possible to extract the nth-ordered statistic in $$O(n)$$ time. The key is eliminating the sort: we don't actually require the entire array to be sorted in order to find the median, so there is some waste in sorting the entire array first. One technique we'll use to accomplish this is randomness.

Before presenting a non-sorting find-median function, we introduce a divide and conquer-style operation known as partitioning. What we want is a routine that finds a random element in the array and then partitions the array into three parts:


 * 1) elements that are less than or equal to the random element;
 * 2) elements that are equal to the random element; and
 * 3) elements that are greater than or equal to the random element.

These three sections are denoted by two integers: j and i. The partitioning is performed "in place" in the array:

// partition -- break the array three partitions based on a randomly picked element function partition(array vals): pair{j, i}

Note that when the random element picked is actually represented three or more times in the array it's possible for entries in all three partitions to have the same value as the random element. While this operation may not sound very useful, it has a powerful property that can be exploited: When the partition operation completes, the randomly picked element will be in the same position in the array as it would be if the array were fully sorted!

This property might not sound so powerful, but recall the optimization for the find-min-max function: we noticed that by picking elements from the array in pairs and comparing them to each other first we could reduce the total number of comparisons needed (because the current min and max values need to be compared with only one value each, and not two). A similar concept is used here.

While the code for partition is not magical, it has some tricky boundary cases:

// partition -- break the array into three ordered partitions from a random element function partition(array vals): pair{j, i} let m := 0 let n := vals.length - 2  //  for an array vals, vals[vals.length-1] is the last element, which holds the partition, // so the last sort element is vals[vals.length-2] let irand := random(m, n)  // returns any value from m to n let x := vals[irand] swap( irand,n+ 1 ) // n+1 = vals.length-1, which is the right most element, and acts as store for partition element and sentinel for m  // values in vals[n..] are greater than x // values in vals[0..m] are less than x while (m <= n ) // see explanation in quick sort why should be m <= n instead of m < n in the 2 element case, // vals.length -2 = 0 = n = m, but if the 2-element case is out-of-order vs. in-order, there must be a different action. // by implication, the different action occurs within this loop, so must process the m = n case before exiting. while vals[m] <= x // in the 2-element case, second element is partition, first element at m. If in-order, m will increment m++ endwhile while x < vals[n] && n > 0   // stops if vals[n] belongs in left partition or hits start of array n—endwhile if ( m >= n) break; swap(m,n)               // exchange vals[n] and vals[m] m++  // don't rescan swapped elements n—endwhile // partition: [0..m-1]  []   [n+1..]   note that m=n+1 // if you need non empty sub-arrays: swap(m,vals.length - 1) // put the partition element in the between left and right partitions // in 2-element out-of-order case, m=0 (not incremented in loop), and the first and last(second) element will swap. // partition: [0..n-1]  [n..n]   [n+1..] end

We can use partition as a subroutine for a general find operation:

// find -- moves elements in vals such that location k holds the value it would when sorted function find(array vals, integer k) assert (0 <= k < vals.length)       // k it must be a valid index if vals.length <= 1: return fi  let pair (j, i) := partition(vals) if k <= i: find(a[0,..,i], k) else-if j <= k: find(a[j,..,n], k - j) fi  TODO: debug this! end

Which leads us to the punch-line:

// find-median -- returns the median element of vals function find-median(array vals): element assert (vals.length > 0) let median_index := vals.length / 2; find(vals, median_index) return vals[median_index] end

One consideration that might cross your mind is "is the random call really necessary?" For example, instead of picking a random pivot, we could always pick the middle element instead. Given that our algorithm works with all possible arrays, we could conclude that the running time on average for all of the possible inputs is the same as our analysis that used the random function. The reasoning here is that under the set of all possible arrays, the middle element is going to be just as "random" as picking anything else. But there's a pitfall in this reasoning: Typically, the input to an algorithm in a program isn't random at all. For example, the input has a higher probability of being sorted than just by chance alone. Likewise, because it is real data from real programs, the data might have other patterns in it that could lead to suboptimal results.

To put this another way: for the randomized median finding algorithm, there is a very small probability it will run suboptimally, independent of what the input is; while for a deterministic algorithm that just picks the middle element, there is a greater chance it will run poorly on some of the most frequent input types it will receive. This leads us to the following guideline:

Note that there are "derandomization" techniques that can take an average-case fast algorithm and turn it into a fully deterministic algorithm. Sometimes the overhead of derandomization is so much that it requires very large datasets to get any gains. Nevertheless, derandomization in itself has theoretical value.

The randomized find algorithm was invented by C. A. R. "Tony" Hoare. While Hoare is an important figure in computer science, he may be best known in general circles for his quicksort algorithm, which we discuss in the next section.

Quicksort
The median-finding partitioning algorithm in the previous section is actually very close to the implementation of a full blown sorting algorithm. Building a Quicksort Algorithm is left as an exercise for the reader, and is recommended first, before reading the next section ( Quick sort is diabolical compared to Merge sort, which is a sort not improved by a randomization step ).

A key part of quick sort is choosing the right median. But to get it up and running quickly, start with the assumption that the array is unsorted, and the rightmost element of each array is as likely to be the median as any other element, and that we are entirely optimistic that the rightmost doesn't happen to be the largest key, which would mean we would be removing one element only ( the partition element) at each step, and having no right array to sort, and a n-1 left array to sort.

This is where randomization is important for quick sort, i.e. choosing the more optimal partition key, which is pretty important for quick sort to work efficiently.

Compare the number of comparisions that are required for quick sort vs. insertion sort.

With insertion sort, the average number of comparisons for finding the lowest first element in an ascending sort of a randomized array is n /2.

The second element's average number of comparisons is (n-1)/2;

the third element ( n- 2) / 2.

The total number of comparisons is  [ n  + (n - 1) + (n - 2) + (n - 3) .. + (n - [n-1]) ] divided by 2,   which is  [ n x n - (n-1)! ] /2 or about O(n squared).

In Quicksort, the number of comparisons will halve at each partition step if the true median is chosen, since the left half partition doesn't need to be compared with the right half partition, but at each step, the number elements of all partitions created by the previously level of partitioning will still be n.

The number of levels of comparing n elements is the number of steps of dividing n by two, until n = 1. Or in reverse, 2 ^ m ~ n, so m = log2 n.

So the total number of comparisons is n (elements) x m (levels of scanning)  or  n x log2n  ,

So the number of comparison is O(n x log 2(n) ), which is smaller than insertion sort's  O(n^2) or O( n x n ).

(Comparing O(n x log 2(n) ) with O( n x n ), the common factor n can be eliminated , and the comparison is log2(n) vs n , which is exponentially different as n becomes larger. e.g. compare  n = 2^16  , or 16 vs 32768, or 32 vs 4 gig ).

To implement the partitioning in-place on a part of the array determined by a previous recursive call, what is needed a scan from each end of the part, swapping whenever the value of the left scan's current location is greater than the partition value, and the value of the right scan's current location is less than the partition value. So the initial step is :- Assign the partition value to the right most element, swapping if necessary.

So the partitioning step is :- increment the left scan pointer while the current value is less than the partition value. decrement the right scan pointer while the current value is more than the partition value , or the location is equal to or more than the left most location. exit if the pointers have crossed ( l >= r), OTHERWISE perform a swap where the left and right pointers have stopped , on values where the left pointer's value is greater than the partition, and the right pointer's value is less than the partition. Finally, after exiting the loop because the left and right pointers have crossed,  swap the rightmost partition value,   with the last location of the left forward scan pointer ,    and hence ends up between the left and right partitions.

Make sure at this point, that after the final swap, the cases of a 2 element in-order array, and a 2 element out-of-order array , are handled correctly, which should mean all cases are handled correctly. This is a good debugging step for getting quick-sort to work.

For the in-order two-element case, the left pointer stops on the partition or second element, as the partition value is found. The right pointer, scanning backwards, starts on the first element before the partition, and stops because it is in the leftmost position.

The pointers cross, and the loop exits before doing a loop swap. Outside the loop, the contents of the left pointer at the rightmost position and the partition, also at the right most position , are swapped, achieving no change to the in-order two-element case.

For the out-of-order two-element case, The left pointer scans and stops at the first element, because it is greater than the partition (left scan value stops to swap values greater than the partition value).

The right pointer starts and stops at the first element because it has reached the leftmost element.

The loop exits because left pointer and right pointer are equal at the first position, and the contents of the left pointer at the first position and the partition at the rightmost (other) position, are swapped , putting previously out-of-order elements ,  into order.

Another implementation issue, is to how to move the pointers during scanning. Moving them at the end of the outer loop seems logical.

partition(a,l,r) { v = a[r]; i = l;  j = r -1; while ( i <= j ) { // need to also scan when i = j as well as i < j , // in the 2 in-order case, // so that i is incremented to the partition // and nothing happens in the final swap with the partition at r.    while ( a[i] < v) ++i; while ( v <= a[j] && j > 0 ) --j; if ( i >= j) break; swap(a,i,j); ++i; --j; } swap(a, i, r); return i;

With the pre-increment/decrement unary operators, scanning can be done just before testing within the test condition of the while loops, but this means the pointers should be offset -1 and +1 respectively at the start : so the algorithm then looks like:-

partition (a, l, r ) { v=a[r]; // v is partition value, at a[r] i=l-1; j=r; while(true) { while( a[++i] < v ); while( v <= a[--j] && j > l ); if (i >= j) break; swap ( a, i, j); } swap (a,i,r); return i; }

And the qsort algorithm is qsort( a, l, r) { if (l >= r) return ; p = partition(a, l, r)  qsort(a, l, p-1) qsort( a, p+1, r) }

Finally, randomization of the partition element.

random_partition (a,l,r) { p = random_int( r-l) + l; // median of a[l], a[p], a[r] if (a[p] < a[l]) p =l; if ( a[r]< a[p]) p = r; swap(a, p, r); }

this can be called just before calling partition in qsort.

Shuffling an Array
This keeps data in during shuffle temporaryArray = { } This records if an item has been shuffled usedItemArray = { } Number of item in array itemNum = 0 while ( itemNum != lengthOf( inputArray) ){ usedItemArray[ itemNum ] = false None of the items have been shuffled itemNum = itemNum + 1 }  itemNum = 0 we'll use this again itemPosition = randdomNumber( 0 --- (lengthOf(inputArray) - 1 )) while( itemNum != lengthOf( inputArray ) ){ while( usedItemArray[ itemPosition ] != false ){ itemPosition = randdomNumber( 0 --- (lengthOf(inputArray) - 1 )) }      temporaryArray[ itemPosition ] = inputArray[ itemNum ] itemNum = itemNum + 1 }  inputArray = temporaryArray

Equal Multivariate Polynomials
[TODO: as of now, there is no known deterministic polynomial time solution, but there is a randomized polytime solution. The canonical example used to be IsPrime, but a deterministic, polytime solution has been found.]

Hash tables
Hashing relies on a hashcode function to randomly distribute keys to available slots evenly. In java, this is done in a fairly straight forward method of adding a moderate sized prime number (31 * 17 ) to a integer key , and then modulus by the size of the hash table. For string keys, the initial hash number is obtained by adding the products of each character's ordinal value multiplied by 31.

The wikibook Data Structures/Hash Tables chapter covers the topic well.

Skip Lists
[TODO: Talk about skips lists. The point is to show how randomization can sometimes make a structure easier to understand, compared to the complexity of balanced trees.]

Dictionary or Map, is a general concept where a value is inserted under some key, and retrieved by the key. For instance, in some languages, the dictionary concept is built-in (Python), in others , it is in core libraries ( C++ S.T.L. , and Java standard collections library ). The library providing languages usually lets the programmer choose between a hash algorithm, or a balanced binary tree implementation (red-black trees). Recently, skip lists have been offered, because they offer advantages of being implemented to be highly concurrent for multiple threaded applications.

Hashing is a technique that depends on the randomness of keys when passed through a hash function, to find a hash value that corresponds to an index into a linear table. Hashing works as fast as the hash function, but works well only if the inserted keys spread out evenly in the array, as any keys that hash to the same index, have to be deal with as a hash collision problem e.g. by keeping a linked list for collisions for each slot in the table, and iterating through the list to compare the full key of each key-value pair vs the search key.

The disadvantage of hashing is that in-order traversal is not possible with this data structure.

Binary trees can be used to represent dictionaries, and in-order traversal of binary trees is possible by visiting of nodes ( visit left child, visit current node, visit right child, recursively ). Binary trees can suffer from poor search when they are "unbalanced" e.g. the keys of key-value pairs that are inserted were inserted in ascending or descending order, so they effectively look like linked lists with no left child, and all right children. self-balancing binary trees can be done probabilistically (using randomness) or deterministically ( using child link coloring as red or black ), through local 3-node tree rotation operations. A rotation is simply swapping a parent with a child node, but preserving order e.g. for a left child rotation, the left child's right child becomes the parent's left child, and the parent becomes the left child's right child.

Red-black trees can be understood more easily if corresponding 2-3-4 trees are examined. A 2-3-4 tree is a tree where nodes can have 2 children, 3 children, or 4 children, with 3 children nodes having 2 keys between the 3 children, and 4 children-nodes having 3 keys between the 4 children. 4-nodes are actively split into 3 single key 2 -nodes, and the middle 2-node passed up to be merged with the parent node, which , if a one-key 2-node, becomes a two key 3-node; or if a two key 3-node, becomes a 4-node, which will be later split (on the way up). The act of splitting a three key 4-node is actually a re-balancing operation, that prevents a string of 3 nodes of grandparent, parent, child occurring , without a balancing rotation happening. 2-3-4 trees are a limited example of B-trees, which usually have enough nodes as to fit a physical disk block, to facilitate caching of very large indexes that can't fit in physical RAM ( which is much less common nowadays).

A red-black tree is a binary tree representation of a 2-3-4 tree, where 3-nodes are modeled by a parent with one red child, and 4 -nodes modeled by a parent with two red children. Splitting of a 4-node is represented by the parent with 2 red children, flipping the red children to black, and itself into red. There is never a case where the parent is already red, because there also occurs balancing operations where if there is a grandparent with a red parent with a red child, the grandparent is rotated to be a child of the parent, and parent is made black and the grandparent is made red; this unifies with the previous flipping scenario, of a 4-node represented by 2 red children. Actually, it may be this standardization of 4-nodes with mandatory rotation of skewed or zigzag 4-nodes that results in re-balancing of the binary tree.

A newer optimization is to left rotate any single right red child to a single left red child, so that only right rotation of left-skewed inline 4-nodes (3 red nodes inline ) would ever occur, simplifying the re-balancing code.

Skip lists are modeled after single linked lists, except nodes are multilevel. Tall nodes are rarer, but the insert operation ensures nodes are connected at each level.

Implementation of skip lists requires creating randomly high multilevel nodes, and then inserting them.

Nodes are created using iteration of a random function where high level node occurs later in an iteration, and are rarer, because the iteration has survived a number of random thresholds (e.g. 0.5, if the random is between 0 and 1).

Insertion requires a temporary previous node array with the height of the generated inserting node. It is used to store the last pointer for a given level, which has a key less than the insertion key.

The scanning begins at the head of the skip list, at highest level of the head node, and proceeds across until a node is found with a key higher than the insertion key, and the previous pointer stored in the temporary previous node array. Then the next lower level is scanned from that node, and so on, walking zig-zag down, until the lowest level is reached.

Then a list insertion is done at each level of the temporary previous node array, so that the previous node's next node at each level is made the next node for that level for the inserting node, and the inserting node is made the previous node's next node.

Search involves iterating from the highest level of the head node to the lowest level, and scanning along the next pointer for each level until a node greater than the search key is found, moving down to the next level, and proceeding with the scan, until the higher keyed node at the lowest level has been found, or the search key found.

The creation of less frequent-when-taller, randomized height nodes, and the process of linking in all nodes at every level, is what gives skip lists their advantageous overall structure.

a method of skip list implementation : implement lookahead single-linked linked list, then test, then transform to skip list implementation, then same test, then performance comparison
What follows is a implementation of skip lists in python. A single linked list looking at next node as always the current node, is implemented first, then the skip list implementation follows, attempting minimal modification of the former, and comparison helps clarify implementation.

Role of Randomness
The idea of making higher nodes geometrically randomly less common, means there are less keys to compare with the higher the level of comparison, and since these are randomly selected, this should get rid of problems of degenerate input that makes it necessary to do tree balancing in tree algorithms. Since the higher level list have more widely separated elements, but the search algorithm moves down a level after each search terminates at a level, the higher levels help "skip" over the need to search earlier elements on lower lists. Because there are multiple levels of skipping, it becomes less likely that a meagre skip at a higher level won't be compensated by better skips at lower levels, and Pugh claims O(logN) performance overall.

Conceptually, is it easier to understand than balancing trees and hence easier to implement ? The development of ideas from binary trees, balanced binary trees, 2-3 trees, red-black trees, and B-trees make a stronger conceptual network but is progressive in development, so arguably, once red-black trees are understood, they have more conceptual context to aid memory, or refresh of memory.

concurrent access application
Apart from using randomization to enhance a basic memory structure of linked lists, skip lists can also be extended as a global data structure used in a multiprocessor application. See supplementary topic at the end of the chapter.

Idea for an exercise
Replace the Linux completely fair scheduler red-black tree implementation with a skip list, and see how your brand of Linux runs after recompiling.

Treaps
A treap is a two keyed binary tree, that uses a second randomly generated key and the previously discussed tree operation of parent-child rotation to randomly rotate the tree so that overall, a balanced tree is produced. Recall that binary trees work by having all nodes in the left subtree small than a given node, and all nodes in a right subtree greater. Also recall that node rotation does not break this order ( some people call it an invariant), but changes the relationship of parent and child, so that if the parent was smaller than a right child, then the parent becomes the left child of the formerly right child. The idea of a tree-heap or treap, is that a binary heap relationship is maintained between parents and child, and that is a parent node has higher priority than its children, which is not the same as the left, right order of keys in a binary tree, and hence a recently inserted leaf node in a binary tree which happens to have a high random priority, can be rotated so it is relatively higher in the tree, having no parent with a lower priority.

A treap is an alternative to both red-black trees, and skip lists, as a self-balancing sorted storage structure.

Treaps compared and contrasted to Splay trees
Splay trees are similar to treaps in that rotation is used to bring a higher priority node to the top without changing the main key order, except instead of using a random key for priority, the last accessed node is rotated to the root of the tree, so that more frequently accessed nodes will be near the top. This means that in treaps, inserted nodes will only rotate upto the priority given by their random priority key, whereas in splay trees, the inserted node is rotated to the root, and every search in a splay tree will result in a re-balancing, but not so in a treap.

Derandomization
[TODO: Deterministic algorithms for Quicksort exist that perform as well as quicksort in the average case and are guaranteed to perform at least that well in all cases. Best of all, no randomization is needed. Also in the discussion should be some perspective on using randomization: some randomized algorithms give you better confidence probabilities than the actual hardware itself! (e.g. sunspots can randomly flip bits in hardware, causing failure, which is a risk we take quite often)]

[Main idea: Look at all blocks of 5 elements, and pick the median (O(1) to pick), put all medians into an array (O(n)), recursively pick the medians of that array, repeat until you have < 5 elements in the array. This recursive median constructing of every five elements takes time T(n)=T(n/5) + O(n), which by the master theorem is O(n). Thus, in O(n) we can find the right pivot. Need to show that this pivot is sufficiently good so that we're still O(n log n) no matter what the input is. This version of quicksort doesn't need rand, and it never performs poorly. Still need to show that element picked out is sufficiently good for a pivot.]

Exercises

 * 1) Write a find-min function and run it on several different inputs to demonstrate its correctness.

Supplementary Topic: skip lists and multiprocessor algorithms
Multiprocessor hardware provides CAS ( compare-and-set) or CMPEXCHG( compare-and-exchange)(intel manual 253666.pdf, p 3-188) atomic operations,  where an expected value is loaded into the accumulator register, which is compared to a target memory location's contents, and if the same, a source memory location's contents is loaded into the target memories contents, and the zero flag set, otherwise, if different, the target memory's contents is returned in the accumulator, and the zero flag is unset, signifying, for instance, a lock contention. In the intel architecture, a LOCK instruction is issued before CMPEXCHG, which either locks the cache from concurrent access if the memory location is being cached, or locks a shared memory location if not in the cache , for the next instruction.

The CMPEXCHG can be used to implement locking, where spinlocks, e.g. retrying until the zero flag is set, are simplest in design.

Lockless design increases efficiency by avoiding spinning waiting for a lock.

The java standard library has an implementation of non-blocking concurrent skiplists, based on a paper titled "a pragmatic implementation of non-blocking single-linked lists".

The skip list implementation is an extension of the lock-free single-linked list, of which a description follows :-

The insert operation is : X -> Y  insert N,  N -> Y,  X -> N ; expected result is X -> N -> Y.

A race condition is if M is inserting between X and Y and M completes first, then N completes, so the situation is X -> N -> Y <- M

M is not in the list. The CAS operation avoids this, because a copy of -> Y is checked before updating X ->, against the current value of X ->.

If N gets to update X -> first, then when M tries to update X -> , its copy of X -> Y , which it got before doing M -> Y , does not match X -> N , so CAS returns non-zero flag set (recall that CAS requires the user to load the accumulator with the expected value, the target location's current value, and then atomically updates the target location with a source location if the target location still contains the accumulator's value). The process that tried to insert M then can retry the insertion after X, but now the CAS checks ->N is X's next pointer, so after retry, X->M->N->Y, and neither insertions are lost.

If M updates X-> first, N 's copy of X->Y does not match X -> M, so the CAS will fail here too, and the above retry of the process inserting N, would have the serialized result of X ->N -> M -> Y.

The delete operation depends on a separate 'logical' deletion step, before 'physical' deletion.

'Logical' deletion involves a CAS change of the next pointer into a 'marked' pointer. The java implementation substitutes with an atomic insertion of a proxy marker node to the next node.

This prevents future insertions from inserting after a node which has a next pointer 'marked', making the latter node 'logically' deleted.

The insert operation relies on another function, search , returning 2 unmarked , at the time of the invocation, node pointers : the first pointing to a node , whose next pointer is equal to the second.

The first node is the node before the insertion point.

The insert CAS operation checks that the current next pointer of the first node, corresponds to the unmarked reference of the second, so will fail 'logically' if the first node's next pointer has become marked after the call to the search function above, because the first node has been concurrently logically deleted. This meets the aim to prevent a insertion occurring concurrently after a node has been deleted.

If the insert operation fails the CAS of the previous node's next pointer, the search for the insertion point starts from the start of the entire list again, since a new unmarked previous node needs to be found, and there are no previous node pointers as the list nodes are singly-linked.



The delete operation outlined above, also relies on the search operation returning two unmarked nodes, and the two CAS operations in delete, one for logical deletion or marking of the second pointer's next pointer, and the other for physical deletion by making the first node's next pointer point to the second node's unmarked next pointer.

The first CAS of delete happens only after a check that the copy of the original second nodes' next pointer is unmarked, and ensures that only one concurrent delete succeeds which reads the second node's current next pointer as being unmarked as well.

The second CAS checks that the previous node hasn't been logically deleted because its next pointer is not the same as the unmarked pointer to the current second node returned by the search function, so only an active previous node's next pointer is 'physically' updated to a copy of the original unmarked next pointer of the node being deleted ( whose next pointer is already marked by the first CAS).

If the second CAS fails, then the previous node is logically deleted and its next pointer is marked, and so is the current node's next pointer. A call to search function again, tidies things up, because in endeavouring to find the key of the current node and return adjacent unmarked previous and current pointers, and while doing so, it truncates strings of logically deleted nodes.

Lock-free programming issues
Starvation could be possible, as failed inserts have to restart from the front of the list. Wait-freedom is a concept where the algorithm has all threads safe from starvation.

The ABA problem exists, where a garbage collector recycles the pointer A, but the address is loaded differently, and the pointer is re-added at a point where a check is done for A by another thread that read A and is doing a CAS to check A has not changed ; the address is the same and is unmarked, but the contents of A has changed.