- ...Cheriton
- {michaelg, cheriton}@cs.stanford.edu.
This work was sponsored in part by ARPA under US Army contract
DABT63-91-K-0001. Michael Greenwald was supported
by a Rockwell Fellowship.
- ...type
-
An example of a TSM implementation is
a collection of descriptors that are stored in a set of page frames
which are allocated and released over time.
When more descriptors are required,
additional page frames can be allocated from the general pool
and when the number of descriptors falls,
the descriptors may be consolidated into a smaller number of pages
and the excessive page frames returned to the general pool.
However, the release of page frames to the general pool must be
delayed
sufficiently to ensure the type-stability property.
This delay provides a useful hysteresis to the movement of pages
between this descriptor collection and the general page pool.
- ...list
- The list is
initialized with a dummy node at the head, thus deletion of the first
element works correctly.
- ...system
- Physical contention
is separate from logical contention because
one can have logical contention without physical contention
as well as vice versa, so called false sharing.
For example, if two shared variable can reside in the same cache line
unit so there can be physical contention without logical contention
if two processor attempt to update the variables simultaneously,
each processor updating a separate variable.
- ...descriptor
- The basic
Herlihy approach involves copying the entire data structure,
modifying the copy, and then atomically replacing the
old copy with the new copy using CAS, and retrying the entire copy
and modifying if there is a conflict.
Our approach reduces the allocation and copy cost to a single descriptor
rather than the entire data structure but requires DCAS.
- ...V1
- Given
data structures that are protected by a version number, te DCAS is
actually a Compare-And-Double-Swap (CADS) --- the second value
cannot have changed if the version number is unchanged. In these
cases a minor optimization is possible and line 4 can be deleted.
- ...[#valois##1#]
- It was necessary to derive our own version
of the algorithm, as the algorithm presented in [24]
is not strictly correct.
This is the natural result of the complicated contortions necessary
when using only CAS.
The DCAS algorithm is relatively straightforward.
- ...processes
- The Valois simulation in Michael and
Scott [18] reports better asymptotic behavior than we do.
The difference appears because the authors are only simulating a FIFO
queue. In the FIFO queue algorithm -- where
insertion always occurs at the tail and deletion at the head --
auxiliary nodes are not
traversed in general and thus don't affect completion time. In fully
general lists auxiliary nodes increase the execution time and memory
traffic.
- ...other
-
Consider processors 15#15, 16#16 and 17#17.
18#18 accesses cache lines Y,Z, 19#19 X,Y, and 20#20
W,X (addressed in ascending alphabetical order).
21#21 and 22#22 should not interact.
However, if 23#23 holds Y and Z and 24#24 holds X,
then when 25#25 asks 26#26 for Y, 27#27 stalls, and buffers 28#28's
request for X. Thus, 29#29 delays 30#30. Longer chains can be
constructed.
- ...use
- Their scheme requires twice the memory
for every possibly shared location
and extra overhead of at least a factor of three for reads and writes
even in the case of no contention.
Michael Greenwald
Wed Sep 18 15:42:13 PDT 1996