{EqualChance}: Addressing Intra-set Write Variation to Increase Lifetime of Non-volatile Caches

Sparsh Mittal; Jeffrey S. Vetter; Nisha Talagala; Torben Mathiasen; Jan Lindström; Clement L. Dickey; Lawrence Chiu

Workshop Program

All sessions will be held in Interlocken Ballroom C unless otherwise noted.

The full papers published by USENIX for the workshop are available for download as an archive or individually below to workshop registrants immediately and to everyone beginning October 5, 2014. Everyone can view the abstracts immediately. Copyright to the individual works is retained by the author[s].

Download Paper Archive:

Attendee Files

INFLOW '14 Paper Archive (ZIP)

8:30 a.m.–9:00 a.m.	Sunday
Continental Breakfast Centennial Foyer
9:00 a.m.–9:05 a.m.	Sunday
Opening Remarks Program Co-Chairs: Kaoutar El Maghraoui and Gokul Kandiraju, IBM T. J. Watson Research Center
9:05 a.m.–10:30 a.m.	Sunday
Flash Transformed Data Centers Session Chair: Gokul Kandiraju, IBM T. J. Watson Research Center Keynote Address I: Blueprints for a Flash-Transformed Data Center: Challenges and Opportunities for System Software Pankaj Mehra, Senior Fellow, SanDisk We begin with an examination of the technical value propositions of flash memory in the data center from the perspective of the major workload types in the data tier. We survey recent work in exposing and exploiting flash memory technology’s key capabilities through software in the control and data planes, and roadmap the major integrations needed in order to deliver on the promised value propositions. The conversation will then shift to creating blueprints for successful deployment of flash technology at scale. The blueprints are categorized by market segments and scale. They guide us in prioritizing OS research for supporting the most suitable attach points and form factors, interfaces and protocols, standards and open source, benchmarks and workloads. We begin with an examination of the technical value propositions of flash memory in the data center from the perspective of the major workload types in the data tier. We survey recent work in exposing and exploiting flash memory technology’s key capabilities through software in the control and data planes, and roadmap the major integrations needed in order to deliver on the promised value propositions. The conversation will then shift to creating blueprints for successful deployment of flash technology at scale. The blueprints are categorized by market segments and scale. They guide us in prioritizing OS research for supporting the most suitable attach points and form factors, interfaces and protocols, standards and open source, benchmarks and workloads. The final remarks will address the exploration-convergence dilemma before us, forcing us to choose: between creative application and standardization; between device-level and service-level offerings; between flash as memory and flash as disk; and between control-plane and data-plane value propositions. It is argued that these are often false choices, that a "both…and" approach will prove superior to an "either…or" approach in order to justify the level of economic investment needed for realizing our vision of an all-flash data center. Dr. Pankaj Mehra is currently SanDisk Senior Fellow. He was SVP & CTO at enterprise flash technology pioneer Fusion-io and before that at Whodini, an e-mail analytics company he founded. Appointed a Distinguished Technologist at Hewlett-Packard in 2004 for his groundbreaking work on persistent memory, Pankaj went on to found HP Labs Russia where he was Chief Scientist until 2010, and where he incubated Taxonom.com, a cloud service for creating ontologies from queries, document collections, and examples. He previously served on the faculty of IIT, Delhi and UC Santa Cruz. Pankaj’s 48 filed patents, 29 papers, and 3 books cover a range of topics in scalable intelligent systems, and his engineered systems have held TPC-C and Terabyte Sort performance records and won recognition from NASA and Sandia National Labs. Pankaj volunteers at MMDS Foundation as their industry liaison chair, and previously served on the editorial boards of IEEE Internet Computing and Transactions on Computers, and on numerous program committees ranging from SuperComputing to International Semantic Web Conference. Read more about Keynote Address I: Blueprints for a Flash-Transformed Data Center: Challenges and Opportunities for System Software Don’t Stack Your Log On My Log Jingpei Yang, Ned Plasson, Greg Gillis, Nisha Talagala, and Swaminathan Sundararaman, SanDisk Corporation Log-structured applications and file systems have been used to achieve high write throughput by sequentializing writes. Flash-based storage systems, due to flash memory’s out-of-place update characteristic, have also relied on log-structured approaches. Our work investigates the impacts to performance and endurance in flash when multiple layers of log-structured applications and file systems are layered on top of a log-structured flash device. We show that multiple log layers affects sequentiality and increases write pressure to flash devices through randomization of workloads, unaligned segment sizes, and uncoordinated multi-log garbage collection. All of these effects can combine to negate the intended positive affects of using a log. In this paper we characterize the interactions between multiple levels of independent logs, identify issues that must be considered, and describe design choices to mitigate negative behaviors in multi-log configurations. Available Media
10:30 a.m.–11:00 a.m.	Sunday
Break with Refreshments Centennial Foyer
11:00 a.m.–12:30 p.m.	Sunday
Tiered Storage Session Chair: Kaoutar El Maghraoui, IBM T. J. Watson Research Center Keynote Address II: Optimal Flash Partitioning for Storage Workloads in Google's Colossus File System Arif Merchant, Research Scientist, Google Janus is a system for partitioning the flash storage tier between workloads in a cloud-scale distributed file system with two tiers, flash storage and disk. The file system stores newly created files in the flash tier and moves them to the disk tier using either a First-In-First-Out (FIFO) policy or a Least-Recently-Used (LRU) policy, subject to per-workload allocations. Janus periodically computes the optimal partitioning of the available flash between workloads to maximize the total reads sent to the flash tier, based on the measured workload characteristics. This talk will describe the motivation behind the design of Janus, some of the analytical techniques used to model the workloads, the implementation, and some results. Janus is a system for partitioning the flash storage tier between workloads in a cloud-scale distributed file system with two tiers, flash storage and disk. The file system stores newly created files in the flash tier and moves them to the disk tier using either a First-In-First-Out (FIFO) policy or a Least-Recently-Used (LRU) policy, subject to per-workload allocations. Janus periodically computes the optimal partitioning of the available flash between workloads to maximize the total reads sent to the flash tier, based on the measured workload characteristics. This talk will describe the motivation behind the design of Janus, some of the analytical techniques used to model the workloads, the implementation, and some results. Arif Merchant is a Research Scientist with the Storage Analytics group at Google, where he studies interactions between components of the storage stack. Prior to this, he was with HP Labs, where he worked on storage QoS, distributed storage systems, and stochastic models of storage. He holds the B.Tech. degree from IIT Bombay and the Ph.D. in Computer Science from Stanford University. He is an ACM Distinguished Scientist. Read more about Keynote Address II: Optimal Flash Partitioning for Storage Workloads in Google's Colossus File System How Could a Flash Cache Degrade Database Performance Rather Than Improve It? Lessons to be Learnt from Multi-Tiered Storage Hyojun Kim, IBM Almaden Research Center; Ioannis Koltsidas and Nikolas Ioannou, IBM Research Zürich; Sangeetha Seshadri, Paul Muench, Clement L. Dickey, and Lawrence Chiu, IBM Almaden Research Center Contrary to intuition, host-side flash caches can degrade performance rather than improve it. With flash write operations being expensive, cache hit-rates need to be relatively high to offset the overhead of writes. Otherwise, the end-to-end performance could be worse with flash cache. We believe that some lessons learnt from multi-tiered storage systems can be applied to flash cache management. Multi-tiered storage systems migrate data based on long-term I/O monitoring, carefully ensuring that the background data migration does not adversely affect foreground I/O performance. To test our hypothesis, we designed and implemented a new flash cache, named Scalable Cache Engine (SCE). In SCE, cache populations occur in the background in 1 MiB sized fragment units rather than the typical storage I/O size (4 KiB). By doing so, we warm-up the flash cache much faster while also benefiting from a prefetching effect that is very effective for improving cache hit-rates when the workload demonstrates strong spatial locality. Additionally, large, aligned writes to flash are much more efficient than small random ones and therefore reduce the cache population overhead. We show that our approach successfully tackles several issues of existing flash cache management approaches and works well for OLTP database workloads. For instance, the throughput under a TPC-E workload actually degraded by 79.1% with flashcache, a popular open-source solution, compared to the baseline performance. For the same conditions, SCE could achieve a 301.7% improved throughput. Available Media
12:30 p.m.–2:00 p.m.	Sunday
Workshop Luncheon Interlocken B
2:00 p.m.–3:30 p.m.	Sunday
FTL/Performance Session Chair: Sam H. Noh, Hongik University, Korea Keynote Address III: Flash Math—FTL Algorithms and Performance Peter Desnoyers, Associate Professor, Northeastern University What makes one workload perform well on a particular SSD, and another poorly? How can we estimate performance from workload parameters? This talk will review recent results in analytic modeling of FTL performance and show how these results may be used not only to answer these questions, but to modify workloads and FTLs for better performance. What makes one workload perform well on a particular SSD, and another poorly? How can we estimate performance from workload parameters? This talk will review recent results in analytic modeling of FTL performance and show how these results may be used not only to answer these questions, but to modify workloads and FTLs for better performance. Peter Desnoyers is an associate professor at Northeastern University. He received his Ph.D. in Computer Science from the University of Massachusetts, Amherst in 2008. Prior to that he spent fifteen years as an engineer in storage and networking industries, at Apple, Motorola, and a number of start-ups, after receiving the BS and MS degrees in EECS from MIT in 1988. His research focuses on operating systems and storage, drawing on his experience to explore practical solutions to the problems of tomorrow's applications and devices. Available Media Read more about Keynote Address III: Flash Math—FTL Algorithms and Performance Erasure Coding & Read/Write Separation in Flash Storage Dimitris Skourtis, Dimitris Achlioptas, Noah Watkins, Carlos Maltzahn, and Scott Brandt, University of California, Santa Cruz We want to create a scalable flash storage system that provides read/write separation and uses erasure coding to provide reliability without the storage cost of replication. Flash on Rails is a system for enabling consistent performance in flash storage by physically separating reads from writes through redundancy. In principle, Rails supports erasure codes. However, it has only been evaluated using replication in small arrays, so it is currently uncertain how it would scale with erasure coding. In this work we consider the applicability of erasure coding in Rails, in a new system called eRails. We consider the effects of computation due to encoding/decoding on the raw performance, as well as its effect on performance consistency. We demonstrate that up to a certain number of drives the performance remains unaffected while the computation cost remains modest. After that point, the computational cost grows quickly due to coding itself making further scaling inefficient. To support an arbitrary number of drives we present a design allowing us to scale eRails by constructing overlapping erasure coding groups that preserve read/write separation. Finally, through benchmarks we demonstrate that eRails achieves read/write separation and consistent read performance under read/write workloads. Available Media
3:30 p.m.–4:00 p.m.	Sunday
Break with Refreshments Centennial Foyer
4:00 p.m.–5:30 p.m.	Sunday
NVM Compression/Lifetime Session Chair: Carlos Maltzahn, University of California, Santa Cruz Compression and SSDs: Where and How? Aviad Zuck and Sivan Toledo, Tel Aviv University; Dmitry Sotnikov and Danny Harnik, IBM Research—Haifa Compression is widely used in storage systems to reduce the amount of data that is written to physical storage devices, in order to improve both bandwidth and price per GB. In SSDs, which use NAND flash devices, compression also helps to improve endurance, which is limited to a fixed number of raw bytes written to the media, and to reduce garbage-collection overheads. Compression is typically implemented in one of three layers: the application, the file system or the firmware of the storage device. Our main findings are that compression embedded within the SSD outperforms the built-in host-side compression engines of a well-known database and file systems. Therefore we focus on intra-SSD compression schemes. We investigate the effects of compression granularity and the arrangement of compressed data in NAND flash pages on data reduction and the lifetime of the SSD. We compare several schemes in this design space, some taken from the literature and some new. Available Media NVM Compression—Hybrid Flash-Aware Application Level Compression Dhananjoy Das, Dulcardo Arteaga, Nisha Talagala, and Torben Mathiasen, SanDisk Corporation; Jan Lindström, SkySQL This paper describes NVM Compression, a novel hybrid technique that combines application level compression with flash awareness for optimal performance and storage efficiency. Utilizing new interface primitives exported by Flash Translation Layers (FTLs), we leverage the garbage collection available in flash devices to optimize the capacity management required by compression systems. We implement NVM Compression in the popular open source database MariaDB based on Oracle MySQL^TM and use variants of commonly available POSIX file system interfaces to provide the extended FTL capabilities to the user space application. The experimental results show that the hybrid approach of NVM Compression can improve compression performance by 2-3x, deliver compression performance for flash devices that is within 5% of uncompressed performance (and sometimes exceed uncompressed performance due to less data writes), improve storage efficiency by 19% compared to legacy Row compression method, reduce data writes by up to 4x when combined with other flash aware techniques such as Atomic Writes, and deliver further advantages in power efficiency and CPU utilization. Available Media EqualChance: Addressing Intra-set Write Variation to Increase Lifetime of Non-volatile Caches Sparsh Mittal, Oak Ridge National Laboratory; Jeffrey S. Vetter, Oak Ridge National Laboratory and Georgia Institute of Technology To address the limitations of SRAM such as high-leakage and low-density, researchers have explored use of non-volatile memory (NVM) devices, such as ReRAM (resistive RAM) and STT-RAM (spin transfer torque RAM) for designing on-chip caches. A crucial limitation of NVMs, however, is that their write endurance is low and the large intra-set write variation introduced by existing cache management policies may further exacerbate this problem, thereby reducing the cache lifetime significantly. We present EqualChance, a technique to increase cache lifetime by reducing intra-set write variation. EqualChance works by periodically changing the physical cache-block location of a write-intensive data item within a set to achieve wear-leveling. Simulations using workloads from SPEC CPU2006 suite and HPC (high-performance computing) field show that EqualChance improves the cache lifetime by 4.29. Also, its implementation overhead is small, and it incurs very small performance and energy loss. Available Media
5:30 p.m.–5:35 p.m.	Sunday
Closing Remarks Program Co-Chairs: Kaoutar El Maghraoui and Gokul Kandiraju, IBM T. J. Watson Research Center

Workshop Program

Continental Breakfast

Break with Refreshments

Workshop Luncheon

Break with Refreshments