Lock |
Confl Acq |
Mean Confl Bytes |
zone.lock |
100.00% |
32.49 |
20.40% |
zone.lru_lock |
65.82% |
6.63 |
20.10% |
ide_lock |
70.92% |
6.93 |
19.03% |
runqueue_t.lock (0xc1807500) |
27.41% |
24.06 |
24.50% |
runqueue_t.lock (0xc180f500) |
29.14% |
26.10 |
23.20% |
inode.i_data.i_mmap_lock |
46.03% |
22.81 |
16.53% |
runqueue_t.lock (0xc1837500) |
27.30% |
27.83 |
19.15% |
seqlock_t.lock |
100.00% |
56.01 |
48.41% |
runqueue_t.lock (0xc1847500) |
23.32% |
26.64 |
19.55% |
runqueue_t.lock (0xc183f500) |
21.03% |
28.21 |
18.37% |
Table 3: Limit study data from the ten longest held kernel spin
locks during the pmake benchmark. This data is
taken from comparing each address set to the last 128 address sets
for that lock, rather than contended acquires, as in
Table 2. Conflicting Acquires are the
percent of acquires that have conflicting data accesses. Mean
Conflicting Bytes shows the average number of conflicting bytes
and their percentage of the total address set size.
4.3 Transactional Memory
To provide a comparison between the performance of lock-based
synchronization and optimistic synchronization, we ran the pmake
benchmark under syncchar on both Linux and on TxLinux with the MetaTM
hardware transactional memory model [
13]. TxLinux is
a version of Linux that has several key synchronization primitives
converted to use hardware transactions. Syncchar measures the time
spent acquiring spinlocks and the time lost to restarted transactions.
For this workload, TxLinux converts 32% of the spinlock acquires in
Linux to transactions, reducing the time lost to synchronization
overhead by 8%. This reduction is largely attributable to removing
cache misses for the spinlock lock variable. The syncchar data
indicates that converting more Linux locks to transactions will yield
speedups that can be exploited by 4–8 processors. However, gaining
even greater concurrency requires data structure redesign.
5 Related work
This paper employs techniques from parallel programming tools to
evaluate the limits of optimistic concurrency. There is a large body
of previous work on debugging and performance tuning tools for
programs [
3, 15]. Our work is different from
other tools because it is concerned more with identifying upper bounds
on performance rather than performance tuning or verifying
correctness.
Lock-free (and modern variants like obstruction-free) data structures
are data-structure specific approaches to optimistic concurrency
[
4, 5]. Lock-free data structures
attempt to change a data structure optimistically, dynamically
detecting and recovering from conflicting accesses.
Transactional memory is a more general form of optimistic
concurrency that allows arbitrary code to be executed atomically.
Herlihy and Moss [
6] introduced one of the earliest
Transactional Memory systems. More recently, Speculative Lock
Elision [
10] and Transactional Lock Removal [
11]
optimistically execute lock regions transactionally. Several designs
for full-function transactional memory hardware have been
proposed [
2, 8, 12].
6 Conclusion
This paper introduces a new methodology and tool for examining the
potential benefits of optimistic concurrency, based on measuring the
data independence of critical section executions. Early results
indicate that there is sufficient data independence to warrant the use
of optimistic concurrency on smaller multiprocessors, but data
structure reorganization will be necessary to realize greater
scalability. The paper motivates more detailed study of the
relationship of concurrency enabled at different levels of
abstraction, and the ease and efficacy of data structure redesign.
7 Acknowledgements
We would like to thank the anonymous reviewers, Hany Ramadan,
Christopher Rossbach, and Virtutech AB. This research is supported in
part by NSF CISE Research Infrastructure Grant EIA-0303609 and NSF
Career Award 0644205.
References
-
1
-
Michael Feldman.
Getting serious about transactional memory.
HPCWire, 2007.
- 2
-
L. Hammond, V. Wong, M. Chen, B. Carlstrom, J. Davis, B. Hertzberg, M. Prabhu,
H. Wijaya, C. Kozyrakis, and K. Olukotun.
Transactional memory coherence and consistency.
In ISCA, 2004.
- 3
-
J. Harrow.
Runtime checking of multithreaded applications with visual threads.
In SPIN, pages 331–342, 2000.
- 4
-
M. Herlihy.
Wait-free synchronization.
In TOPLAS, January 1991.
- 5
-
M. Herlihy, V. Luchangco, and M. Moir.
Obstruction-free synchronization: Double-ended queues as an example.
In ICDCS, 2003.
- 6
-
M. Herlihy and J. E. Moss.
Transactional memory: Architectural support for lock-free data
structures.
In ISCA, May 1993.
- 7
-
P.S. Magnusson, M. Christianson, and J. Eskilson et al.
Simics: A full system simulation platform.
In IEEE Computer vol.35 no.2, Feb 2002.
- 8
-
K. E. Moore, J. Bobba, M. J. Moravan, M. D. Hill, and D. A. Wood.
LogTM: Log-based transactional memory.
In HPCA, 2006.
- 9
-
C. Papadimitriou.
The serializability of concurrent database updates.
J. ACM, 26(4):631–653, 1979.
- 10
-
R. Rajwar and J. Goodman.
Speculative lock elision: Enabling highly concurrent multithreaded
execution.
In MICRO, 2001.
- 11
-
R. Rajwar and J. Goodman.
Transactional lock-free execution of lock-based programs.
In ASPLOS, October 2002.
- 12
-
R. Rajwar, M. Herlihy, and K. Lai.
Virtualizing transactional memory.
In ISCA. 2005.
- 13
-
H.E. Ramadan, C.J. Rossbach, D.E. Porter, O.S. Hofmann, A. Bhandari, and
E. Witchel.
MetaTM/TxLinux: Transactional memory for an operating system.
In ISCA, 2007.
- 14
-
J. G. Steffan, C. B. Colohan, A. Zhai, and T. C. Mowry.
A scalable approach to thread-level speculation.
In ISCA, 2000.
- 15
-
Y. Yu, T. Rodeheffer, and W. Chen.
Racetrack: Efficient detection of data race conditions via adaptive
tracking.
In SOSP, 2005.
- 1
- We selected the term data conflicts
over data dependence to avoid confusion with other meanings.
This document was translated from LATEX by
HEVEA.