Nadav Amit, VMware Research
The operating system is tasked with maintaining the coherency of per-core TLBs, necessitating costly synchronization operations, notably to invalidate stale mappings. As core-counts increase, the overhead of TLB synchronization likewise increases and hinders scalability, whereas existing software optimizations that attempt to alleviate the problem (like batching) are lacking.
We address this problem by revising the TLB synchronization subsystem. We introduce several techniques that detect cases whereby soon-to-be invalidated mappings are cached by only one TLB or not cached at all, allowing us to entirely avoid the cost of synchronization. In contrast to existing optimizations, our approach leverages hardware page access tracking. We implement our techniques in Linux and find that they reduce the number of TLB invalidations by up to 98% on average and thus improve performance by up to 78%. Evaluations show that while our techniques may introduce overheads of up to 9% when memory mappings are never removed, these overheads can be avoided by simple hardware enhancements.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Nadav Amit},
title = {Optimizing the {TLB} Shootdown Algorithm with Page Access Tracking},
booktitle = {2017 USENIX Annual Technical Conference (USENIX ATC 17)},
year = {2017},
isbn = {978-1-931971-38-6},
address = {Santa Clara, CA},
pages = {27--39},
url = {https://www.usenix.org/conference/atc17/technical-sessions/presentation/amit},
publisher = {USENIX Association},
month = jul
}