{MemC3}: Compact and Concurrent {MemCache} with Dumber Caching and Smarter Hashing

Bin Fan; David G. Andersen; Michael Kaminsky

Authors:

Bin Fan and David G. Andersen, Carnegie Mellon University; Michael Kaminsky, Intel Labs

Abstract:

This paper presents a set of architecturally and workload inspired algorithmic and engineering improvements to the popular Memcached system that substantially improve both its memory efficiency and throughput. These techniques—optimistic cuckoo hashing, a compact LRU-approximating eviction algorithm based uponCLOCK, and comprehensive implementation of optimistic locking—enable the resulting system to use 30% less memory for small key-value pairs, and serve up to 3x as many queries per second over the network. We have implemented these modiﬁcations in a system we call MemC3—Memcached with CLOCK and Concurrent Cuckoo hashing—but believe that they also apply more generally to many of today’s read-intensive, highly concurrent networked storage and caching systems.

Bin Fan, Carnegle Mellon University

David G. Andersen, Carnegle Mellon University

Michael Kaminsky, Intel Labs

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX

@inproceedings {180323,
author = {Bin Fan and David G. Andersen and Michael Kaminsky},
title = {{MemC3}: Compact and Concurrent {MemCache} with Dumber Caching and Smarter Hashing},
booktitle = {10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13)},
year = {2013},
isbn = {978-1-931971-00-3},
address = {Lombard, IL},
pages = {371--384},
url = {https://www.usenix.org/conference/nsdi13/technical-sessions/presentation/fan},
publisher = {USENIX Association},
month = apr
}

Download

Fan PDF

View the slides

Presentation Video

Presentation Audio

Download Audio

Public Summary:

by Matthew Caesar

Modern data center environments are subject to some of the largest and most extreme workloads of any computing system that has ever existed. At the same time, they are also subject to some of the most extreme performance requirements. With interactive services (gaming, financial services) increasingly moving to the cloud, and as use of cloud computing continues to grow, the challenge faced by data center operators in designing low-latency systems at scale and low cost will continue to worsen.

To address this challenge, the authors of this paper propose MemC3. MemC3 is a low-latency memory-based distributed object caching system for data center and other networked environments. MemC3 provides a hash-table-like API, enabling queries. When placed as afront end before services like databases or web servers, MemC3 reduces network traffic and also load on the services.

Now, nothing I've described so far is actually new. This problem has already been identified and solved by a widely used system called Memcached. Memcached has reached a high level of maturity and industry acceptance, being used by sites such as Youtube, Wikipedia, Facebook, and Twitter. However, MemC3 provides a phase change in how these sorts of systems are built. The authors present a set of algorithmic and engineering improvements to Memcached that offer a surprisingly large performance improvement. The fact that the authors were able to achieve such enormous performance gains in a system that is of such core importance to so many modern web services is a testament to thecleverness and insight behind the work.

At the same time, MemC3 leaves several avenues open for future work in this space. First, MemC3 may have a locking bottleneck in some settings, as the core algorithms of the paper rely on serialize all writers. This is totally fine for the read-oriented workloads considered by the paper, which are common in many settings, but could harm performance for more write-intensive workloads. The workloads used in the paper are also synthetic, leaving some open questions about how well the approach would work in real-worlddeployments. Also, while Cuckoo hashing improves memory usage, it can worsen lookup performance, as it increases the raw lookup path length and makes inserts more costly (that said, the in-hash table tag optimization appears to eliminate most of this cost).In addition, the authors only consider one design; it maybe interesting to investigate what other alternatives are available and whether Cuckoo hashing is the best option amongst them.

Another interesting question might be a study of which contributions in this paper are providing the benefit. For example, how much of the benefit comes from CLOCK, rather than cuckoo hashing? From the results (Figure 8 and Table 3) we can see that simply changing LRU to Clock can significantly improve system performance, even without using cuckoo hashing (that said, moving to Cuckoo+CLOCK is even better). This may be an interestingresearch question in itself—Cuckoo hashing is more complex and expensive than chaining—is it possible to achieve a high load factor with a simple mechanism? Or is there a fundamental tradeoff between write complexity and space utilization? In addition, the paper (quite understandably) does not provide a performance comparison with all related works in this space, including works like Mastree, and other works that shard data across cores, which seem related. Moreover, as the authors point out, none of the techniquesare new in themselves (though the program committee felt there was substantial novelty and cleverness in their combination and optimizations). In addition, the program committee wrestled with the fit of this paper for NSDI, being more about a system that is used by a network rather than saying much about "networked and/or distributed systems." It is arguably somewhat out of scope as its primary contributions are in the design of concurrent data structures.

Now, there's no way the authors could have addressed all these things in one paper. But it is clear that there is more work to be done in this space and more papers to write. In some sense, the authors have opened a door into a whole new area of interesting researchquestions. What kind of other algorithms could be used in place of Cuckoo hashing? What is the right way to design distributed object caching architectures? Caching as a mechanism is used in many environments and the authors' contributions seem quite fundamental;are there other environments where Cuckoo hashing can provide performance benefits? It is also worth noting the authors have released a full implementation of their hash table library on github (https://github.com/efficient/libcuckoo), which may provide a usefulplatform to begin investigating such questions.

In addition to making solid technical contributions to a high-impact problem, and achieving surprisingly good results, the paper leads to as many questions as it answers, something the program committee appreciated and felt was the mark of a truly excellent piece of work. If you're leafing through the NSDI papers and trying to decide which to read, I would highly recommend this one, it's exciting and easy to read, yet with impressive technical depth and insight.

connect with us