Herlihy [14] presents a methodology for converting sequential implementations of data structures into wait-free concurrent implementations. The goal is to provide a specification and transformation that is provably correct and can be applied automatically to sequential code. It converts a sequential implementation of any data structure into a wait-free, concurrent one, just using CAS (or, slightly more efficiently [14] using load-linked and store-conditional). However, this method involves copying the entire data structure, modifying the copy, and then atomically replacing the old copy with the new copy using CAS, and retrying the entire copy and modifying if there is a conflict. Performance can be improved using other, more ad-hoc, techniques [14], but these techniques tend to add hard-to-catch subtle synchronization problems and are still expensive. Overall, we regard this approach as impractically expensive because of the copy overhead.
In contrast, our contribution is a set of general techniques that the programmer incorporates in the software design and implementation, allowing the software to be used in both sequential and parallel execution with no modification and with acceptable performance.
-comment Help!! I dont understand this paragraph at all. Herlihy finds that the probabilistic guarantee against starvation provided by using exponential backoff with the lock-free implementation, is preferable to the deterministic guarantee provided by a wait-free implementation due to the high overhead in the wait-free case. Similar arguments apply to a wide range of algorithmic choices. Consider the modifications of Barnes [3], Turek [23], and Valois [24] to increase the concurrency and reduce the rate of retries when multiple processes using synchronized, non-blocking, access to a shared data structure. We find that the extra overhead of their schemes in the low contention cases outweighs the savings in reduced retries under high contention -- the gains in concurrency are eaten up by the higher cost and increased memory contention. Our simple protocol using a single version number for the entire data structure does limit concurrency, but the cost of the operations is often several times cheaper.
Barnes [3], Turek [23], and Valois [24] provide techniques for increasing the concurrency with some non-blocking synchronization. However, the cost of concurrent updates appears to outweigh the actual benefit, because the low rates of contention in our system. Studies such as [22], which also reported a low level of contention on kernel data structures, suggest that this phenomenon might be more widely true than just in the Cache Kernel.