Workshop Program

All sessions will be held in Colorado Ballroom F unless otherwise noted.
Papers are available for download below to registered attendees now and to everyone beginning June 20, 2016. Paper abstracts are available to everyone now. Copyright to the individual works is retained by the author[s].

Downloads for Registered Attendees

Attendee Files 
HotStorage '16 Paper Archive (ZIP)
Hot Storage '16 Attendee List (PDF)

 

Monday, June 20, 2016

7:30 am–9:00 am Monday

Continental Breakfast

Ballroom Foyer

8:45 am–9:00 am Monday

Opening Remarks

Program Co-Chairs: Nitin Agrawal, Samsung Research, and Sam H. Noh, UNIST (Ulsan National -Institute of Science and Technology)

9:00 am–10:40 am Monday

Analyzing Analytics

Session Chair: Nitin Agrawal, Samsung Research

Quartet: Harmonizing Task Scheduling and Caching for Cluster Computing

Francis Deslauriers, Peter McCormick, George Amvrosiadis, Ashvin Goel, and Angela Demke Brown, University of Toronto

Cluster computing frameworks such as Apache Hadoop and Apache Spark are commonly used to analyze large data sets. The analysis often involves running multiple, similar queries on the same data sets. This data reuse should improve query performance, but we find that these frameworks schedule query tasks independently of each other and are thus unable to exploit the data sharing across these tasks. We present Quartet, a system that leverages information on cached data to schedule together tasks that share data. Our preliminary results are promising, showing that Quartet can increase the cache hit rate of Hadoop and Spark jobs by up to 54%. Our results suggest a shift in the way we think about job and task scheduling today, as Quartet is expected to perform better as more jobs are dispatched on the same data.

Available Media

Lazy Analytics: Let Other Queries Do the Work For You

William Jannen and Michael A. Bender, Stony Brook University; Martin Farach-Colton, Rutgers University; Rob Johnson, Stony Brook University; Bradley C. Kuszmaul, Massachusetts Institute of Technology; Donald E. Porter, Stony Brook University

We propose a class of query, called a derange query, that maps a function over a set of records and lazily aggregates the results. Derange queries defer work until it is either convenient or necessary, and, as a result, can reduce total I/O costs of the system.

Derange queries operate on a view of the data that is consistent with the point in time that they are issued, regardless of when the computation completes. They are most useful for performing calculations where the results are not needed until some future deadline. When necessary, derange queries can also execute immediately. Users can view partial results of in-progress queries at low cost.

Available Media

SEeSAW - Similarity Exploiting Storage for Accelerating Analytics Workflows

Kalapriya Kannan, Suparna Bhattacharya, Kumar Raj, Muthukumar Murugan, and Doug Voigt, Hewlett Packard Enterprise

The key to successful deployment of big data solutions lies in the timely distillation of meaningful information. This is made difficult by the mismatch between volume and velocity of data at scale and challenges posed by disparate speeds of IO, CPU, memory and communication links of data storage and processing systems. Instead of viewing storage as a bottleneck in this pipeline, we believe that storage systems are best positioned to discover and exploit intrinsic data properties to enhance information density of stored data. This has the potential to reduce the amount of new information that needs to be processed by an analytics workflow. Towards exploring this possibility, we propose SEeSAW, a Similarity Exploiting Storage for Accelerating Analytics Workflows that makes similarity a fundamental storage primitive. We show that SEeSAW transparently eliminates the need for applications to process uninformative data, thereby incurring substantially lower costs on IO, memory, computation and communication while speeding up (about 97% as observed in our experiment) the rate at which actionable outcomes can be derived by analyzing data. By increasing capacity of analytics workloads to absorb more data within the same resource envelope, SEeSAW can open up rich opportunities to reap greater benefits from machine and human generated data accumulated from various sources.

Available Media

Neutrino: Revisiting Memory Caching for Iterative Data Analytics

Erci Xu, The Ohio State University; Mohit Saxena and Lawrence Chiu, IBM Almaden Research Center

In-memory analytics frameworks such as Apache Spark are rapidly gaining popularity as they provide order of magnitude performance speedup over disk-based systems for iterative workloads. For example, Spark uses the Resilient Distributed Dataset (RDD) abstraction to cache data in memory and iteratively compute on it in a distributed cluster.

In this paper, we make the case that existing abtractions such as RDD are coarse-grained and only allow discrete cache levels to be used for caching data. This results in inefficient memory utilization and lower than optimal performance. In addition, relying on the programmer to enforce caching decisions for an RDD makes it infeasible for the system to adapt to runtime changes. To overcome these challenges, we propose Neutrino that employs fine-grained memory caching of RDD partitions and adapts to the use of different in-memory cache levels based on runtime characteristics of the cluster. First, it extracts a data flow graph to capture the data access dependencies between RDDs across different stages of a Spark application without relying on cache enforcement decisions from the programmer. Second, it uses a dynamic-programming based algorithm to guide caching decisions across the cluster and adaptively convert or discard the RDD partitions from the different cache levels.

We have implemented a prototype of Neutrino as an extension to Spark and use four different machine-learning workloads for performance evaluation. Neutrino improves the average job execution time by up to 70% over the use of Spark’s native memory cache levels.

Available Media
10:40 am–11:15 am Monday

Break with Refreshments

Ballroom Foyer

11:15 am–12:30 pm Monday

Not Your Father's HDD

Session Chair: Vasily Tarasov, IBM Almaden Research Center

Feeding the Pelican: Using Archival Hard Drives for Cold Storage Racks

Richard Black, Austin Donnelly, and Dave Harper, Microsoft Research; Aaron Ogus, Microsoft; Antony Rowstron, Microsoft Research

Microsoft’s Pelican storage rack uses a new class of hard disk drive (HDD), known by vendors as archival class HDD. These HDDs are explicitly designed to store cooler and archival data, differing from existing HDDs by trading performance for cost. Our early Pelican experiences have helped some vendors define the particular characteristics of this class of drive. During the last twelve or so months we have gained considerable data on how these drives perform in Pelicans and in this paper we present data gathered from a test and a production environment. A key design choice for Pelican was to have only a small fraction of the HDDs concurrently spun up making Pelican a harsh environment to operate a HDD. We present data showing how the drives have been used, their power profile, their AFR, and conclude by discussing some issues for the future of these archive HDDs. As flash capacities increase eventually all HDDs will be archive class, so understanding their characteristics is of wide interest.

Available Media

ZEA, A Data Management Approach for SMR

Adam Manzanares, Western Digital Research; Noah Watkins, University of California, Santa Cruz; Cyril Guyot and Damien LeMoal, Western Digital Research; Carlos Maltzahn, University of California, Santa Cruz; Zvonimr Bandic, Western Digital Research

Digital data is projected to double every two years creating the need for cost effective and performant storage media [4]. Hard disk drives (HDDs) are a cost effective storage media that sit between speedy yet costly flashbased storage, and cheap but slower media such as tape drives. However, virtually all HDDs today use a technology called perpendicular magnetic recording, and the density achieved with this technology is reaching scalability limits due to physical properties of the technology [17]. While new technologies such as shingled magnetic recording (SMR) that further increase areal density are slated to enter the market [6], existing systems software is not prepared to fully utilize these devices because of the unique I/O constraints that they introduce.

SMR requires systems software to conform to the shingling constraint. The shingling constraint is an I/O ordering constraint imposed at the device level, and requires that writes be sequential and contiguous within a subset of the disk, called a zone. Thus, software that requires random block updates must use a scheme to serialize writes to the drive. This scheme can be handled internally in a drive or an alternative approach is to expose the zone abstraction and shingling constraint to the host operating system. Host level solutions are challenging because the shingling constraint is not compatible with software that assumes a random-write block device model, which has been in use for decades. The shingling constraint influences all layers of the I/O stack, and each layer must be made SMR compliant.

In order to manage the shingling write constraint of SMR HDDs, we have designed a zone-based extent allocator that maps ZEA logical blocks (ZBA) to LBAs of the HDD. Figure 1a depicts how ZEA is mapped onto a SMR HDD comprised of multiple types of zones, which are described in Table 1. ZEA writes logical extents, comprised of data and metadata, sequentially onto the SMR zone maintaining the shingling constraint.

Available Media

Evaluating Host Aware SMR Drives

Fenggang Wu, University of Minnesota, Twin Cities; Ming-Chang Yang, National Taiwan University; Ziqi Fan, Baoquan Zhang, Xiongzi Ge, and David H.C. Du, University of Minnesota, Twin Cities

Shingled Magnetic Recording (SMR) technology increases the areal density of hard disk drives. Among the three types of SMR drives on the market today, Host Aware SMR (HA-SMR) drives look the most promising. In this paper, we carry out evaluation to understand the performance of HA-SMR drives with the objective of building large-scale storage systems using this type of drive. We focus on evaluating the special features of HA-SMR drives, such as the open zone issue and media cache cleaning efficiency. Based on our observations we propose a novel host-controlled indirection buffer to enhance the drive’s I/O performance. Finally, we present a case study of the open zone issue to show the potential of this host-controlled indirection buffer for HA-SMR drives.

Available Media
12:30 pm–2:00 pm Monday

Luncheon for Workshop Attendees

Colorado Ballroom E

2:00 pm–3:15 pm Monday

Flash Looking New

Session Chair: Song Jiang, Wayne State University

Avoiding the Streetlight Effect: I/O Workload Analysis with SSDs in Mind

Gala Yadgar and Moshe Gabel, Technion—Israel Institute of Technology

Storage systems are designed and optimized relying on wisdom derived from analysis studies of file-system and block-level workloads. However, while SSDs are becoming a dominant building block in many storage systems, their design continues to build on knowledge derived from analysis targeted at hard disk optimization. Though still valuable, it does not cover important aspects relevant for SSD performance. In a sense, we are “searching under the streetlight”, possibly missing important opportunities for optimizing storage system design.

We present the first I/O workload analysis designed with SSDs in mind. We characterize traces from four repositories and examine their ‘temperature’ ranges, sensitivity to page size, and ‘logical locality’. Our initial results reveal nontrivial aspects that can significantly influence the design and performance of SSD-based systems.

Available Media

NVMeDirect: A User-space I/O Framework for Application-specific Optimization on NVMe SSDs

Hyeong-Jun Kim, Sungkyunkwan University; Young-Sik Lee, Korea Advanced Institute of Science and Technology (KAIST); Jin-Soo Kim, Sungkyunkwan University

The performance of storage devices has been increased significantly due to emerging technologies such as Solid State Drives (SSDs) and Non-Volatile Memory Express (NVMe) interface. However, the complex I/O stack of the kernel impedes utilizing the full performance of NVMe SSDs. The application-specific optimization is also difficult on the kernel because the kernel should provide generality and fairness.

In this paper, we propose a user-level I/O framework which improves the performance by allowing user applications to access commercial NVMe SSDs directly without any hardware modification. Moreover, the proposed framework provides flexibility where user applications can select their own I/O policies including I/O completion method, caching, and I/O scheduling. Our evaluation results show that the proposed framework outperforms the kernel-based I/O by up to 30% on microbenchmarks and by up to 15% on Redis.

Available Media

Optimizing Flash-based Key-value Cache Systems

Zhaoyan Shen, Hong Kong Polytechnic University; Feng Chen and Yichen Jia, Louisiana State University; Zili Shao, Hong Kong Polytechnic University

Flash-based key-value cache systems, such as Facebook’s McDipper [1] and Twitter’s Fatcache [2], provide a cost-efficient solution for high-speed key-value caching. These cache solutions typically take commercial SSDs and adopt a Memcached-like scheme to store and manage key-value pairs in flash. Such a practice, though simple, is inefficient. We advocate to reconsider the hardware/software architecture design by directly opening device-level details to key-value cache systems. This co-design approach can effectively bridge the semantic gap and closely connect the two layers together. Leveraging the domain knowledge of key-value caches and the unique device-level properties, we can maximize the efficiency of a key-value cache system on flash devices while minimizing its weakness. We are implementing a prototype based on the Open-channel SSD hardware platform. Our preliminary experiments show very promising results.

Available Media
3:15 pm–3:45 pm Monday

Break with Refreshments

Ballroom Foyer

3:45 pm–5:25 pm Monday

On Being Distributed

Session Chair: Marcos Aguilera, VMware Research

ClusterOn: Building Highly Configurable and Reusable Clustered Data Services Using Simple Data Nodes

Ali Anwar and Yue Cheng, Virginia Polytechnic Institute and State University; Hai Huang, IBM T. J. Watson Research Center; Ali R. Butt, Virginia Polytechnic Institute and State University

The growing variety of data storage and retrieval needs is driving the design and development of an increasing number of distributed storage applications such as keyvalue stores, distributed file systems, object stores, and databases. We observe that, to a large extent, such applications would implement their own way of handling features of data replication, failover, consistency, cluster topology, leadership election, etc. We found that 45– 82% of the code in six popular distributed storage applications can be classified as implementations of such common features. While such implementations allow for deeper optimizations tailored for a specific application, writing new applications to satisfy the ever-changing requirements of new types of data or I/O patterns is challenging, as it is notoriously hard to get all the features right in a distributed setting.

In this paper, we argue that for most modern storage applications, the common feature implementation (i.e., the distributed part) can be automated and offloaded, so developers can focus on the core application functions. We are designing a framework, ClusterOn, which aims to take care of the

messy plumbing

of distributed storage applications. The envisioned goal is that a developer simply “drops” a non-distributed application into ClusterOn, which will convert it into a scalable and highly configurable distributed application.

Available Media

Silver: A Scalable, Distributed, Multi-versioning, Always Growing (Ag) File System

Michael Wei, VMware Research and University of California, San Diego; Amy Tai, VMware Research and Princeton University; Chris Rossbach, Ittai Abraham, and Udi Wieder, VMware Research; Steven Swanson, University of California, San Diego; Dahlia Malkhi, VMware Research

The storage needs of users have shifted from just needing to store data to requiring a rich interface which enables the efficient query of versions, snapshots and creation of clones. Providing these features in a distributed file system while maintaining scalability, strong consistency and performance remains a challenge. In this paper we introduce Silver, a file system which leverages the Corfu distributed logging system to not only store data, but to provide fast strongly consistent snapshots, clones and multi-versioning while preserving the scalability and performance of the distributed shared log. We describe and implement Silver using a FUSE prototype and show its performance characteristics.

Available Media

Exo-clones: Better Container Runtime Image Management across the Clouds

Richard P. Spillane, Wenguang Wang, Luke Lu, Maxime Austruy, Christos Karamanolis, and Rawlinson Rivera, VMware

Our key innovation is to allow volume snapshots in VDFS (our native hyper-converged distributed file system) to be exported to a stand-alone regular file that can be imported to another VDFS cluster efficiently (zerocopy when possible) called exo-clones. Our exo-clones carry provenance, policy, and similar to git commits, the fingerprints of the parent clones from which they were derived. They are analogous to commits in a distributed source control system, and can be stored outside of VDFS, rebased, and signed. Although they can be unpacked to any directory, when used with VDFS they can be mounted directly with zero-copying and are instantly available to all nodes mounting VDFS. VDFS with exoclones provides the format and the tools necessary to both transfer, and run encapsulated applications in both public and private clouds, and in both test/dev and production environments.

Available Media

Finding Consistency in an Inconsistent World: Towards Deep Semantic Understanding of Scale-out Distributed Databases

Neville Carvalho, Hyojun Kim, Maohua Lu, Prasenjit Sarkar, Rohit Shekhar, Tarun Thakur, Pin Zhou, Datos IO; Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison

We present a new problem in data storage: how to build efficient backup and restore tools for increasingly popular Next-generation Eventually Consistent STorage systems (NECST). We show that the lack of a concise, consistent, logical view of data at a point-in-time is the key underlying problem; we suggest a deep semantic understanding of the data stored within the system of interest as a solution. We discuss research and productization challenges in this new domain, and present the status of our platform, Datos CODR (Consistent Orchestrated Distributed Recovery), which can extract consistent and deduplicated backups from NECST systems such as Cassandra, MongoDB, and many others.

Available Media
6:00 pm–7:00 pm Monday

Joint Poster Session and Happy Hour with HotCloud

Colorado Ballroom A–E

 

Tuesday, June 21, 2016

8:00 am–9:00 am Tuesday

Continental Breakfast

Ballroom Foyer

9:00 am–10:30 am Tuesday

Joint Keynote Address with HotCloud

What's Changing in Big Data?

Matei Zaharia, Massachusetts Institute of Technology

Matei Zaharia is an assistant professor of computer science at MIT as well as CTO of Databricks, the company commercializing Apache Spark. He is broadly interested in computer systems, data centers and data management. He started the Spark project while he was a PhD student at UC Berkeley, and he has also contributed to other open source cluster computing projects such as Apache Mesos and Apache Hadoop. Matei received the 2014 ACM Doctoral Dissertation Award for his graduate work.

Big data analytics became a hot research topic nearly ten years ago, but since that time, a lot of things have changed. On the hardware side, trends such as the slowdown of processing with respect to I/O are starting to affect the design of big data systems. On the application side, big data systems are increasingly being used by non-programmers and require similar forms of interaction to "small data" analysis tools. Finally, big data systems are increasingly provided "as a service" on cloud infrastructure. I'll talk about these changes from the perspective of the Apache Spark project and from my experience at a company offering a cloud service for big data (Databricks).

Big data analytics became a hot research topic nearly ten years ago, but since that time, a lot of things have changed. On the hardware side, trends such as the slowdown of processing with respect to I/O are starting to affect the design of big data systems. On the application side, big data systems are increasingly being used by non-programmers and require similar forms of interaction to "small data" analysis tools. Finally, big data systems are increasingly provided "as a service" on cloud infrastructure. I'll talk about these changes from the perspective of the Apache Spark project and from my experience at a company offering a cloud service for big data (Databricks).

Matei Zaharia is an assistant professor of computer science at MIT as well as CTO of Databricks, the company commercializing Apache Spark. He is broadly interested in computer systems, data centers and data management. He started the Spark project while he was a PhD student at UC Berkeley, and he has also contributed to other open source cluster computing projects such as Apache Mesos and Apache Hadoop. Matei received the 2014 ACM Doctoral Dissertation Award for his graduate work.

Available Media
10:30 am–11:00 am Tuesday

Break with Refreshments

Ballroom Foyer

11:00 am–12:15 pm Tuesday

Revisiting Mobile Storage, Again!

Session Chair: Vijay Chidambaram, VMware and The University of Texas at Austin

Why Do We Always Blame The Storage Stack?

Hao Luo, Hong Jiang, and Myra B. Cohen, University of Nebraska—Lincoln

Much research effort has been devoted to improving the performance of the I/O stack in mobile devices, but limited time has been spent evaluating mobile application performance from the user’s perspective. In this paper, we try to understand how applications running on the newest devices behave with respect to this metric. We develop a methodology for quantifying user-perceived latency and use it to evaluate four common application benchmarks with I/O stack optimization on two of the latest smartphones. Contrary to our expectation, we find that (i) these applications respond reasonably fast and (ii) the user-perceived latency does not drastically (at most 11:8%) benefit from I/O stack optimizations.

Available Media

An Empirical Study of File-System Fragmentation in Mobile Storage Systems

Cheng Ji, City University of Hong Kong; Li-Pin Chang, National Chiao-Tung University; Liang Shi, Chongqing University; Chao Wu, City University of Hong Kong; Qiao Li, Chongqing University; Chun Jason Xue, City University of Hong Kong

Nowadays, mobile devices have become the necessities of everyday life. However, users may notice that after a long period of usage, mobile devices will start experiencing sluggish response. In this paper, by conducting an empirical study of filesystem fragmentation on several aged mobile devices, we found that: 1) Files may suffer from severe fragmentation, and database files are among the most severely fragmented files; 2) Filesystem fragmentation does affect the performance of mobile devices, and the impact varies from devices to devices. Conventional defragmentation schemes do not work well on mobile devices because they do not consider the characteristics specific to mobile storage. Two pilot solutions were then suggested to enhance file defragmentation for mobile devices.

Available Media

Pixelsior: Photo Management as a Platform Service for Mobile Apps

Kyungho Jeon, Sharath Chandrashekhara, Karthik Dantu, and Steven Y. Ko, University at Buffalo

Photo management has become a sizable fraction of our computer interaction. Due to economic incentives, every software company wants to restrict users to using their software for photo management and use. Unfortunately, this results in duplication of images, repeated image manipulation operations, and an overall uneven and siloed user experience. In this paper, we motivate the need for a dedicated platform service for photo management which can not only manage the photos on one device, but also transparently manage content adaptation, image manipulation and propagation of the manipulation to all the applications on a device, and all devices using the service. Pixelsior presents our study of the requirements of such a system as well as a preliminary design motivated by requirements of consistency and efficiency.

Available Media
12:15 pm–2:00 pm Tuesday

Luncheon for Workshop Attendees

Colorado Ballroom A–E

2:00 pm–3:40 pm Tuesday

Replicate, Dedup, then NVM

Session Chair: William J. Bolosky, Microsoft

99 Deduplication Problems

Philip Shilane, Ravi Chitloor, and Uday Kiran Jonnala, EMC Corporation

Deduplication is a widely studied capacity optimization technique that replaces redundant regions of data with references. Not only is deduplication an ongoing area of academic research, numerous vendors have deduplicated storage products. Historically, most deduplication related publications focus on a narrow range of topics: maximizing deduplication ratios and read/write performance. While future research will continue to optimize these areas, we believe that there are numerous novel, deduplication-specific problems that have been largely ignored in the academic community. Based on feedback from customers as well as internal architecture discussions, we present new deduplication problems that will hopefully spur the next generation of research.

Available Media

A Simulation Result of Replicating Data with Another Layout for Reducing Media Exchange of Cold Storage

Satoshi Iwata and Kensuke Shiozawa, Fujitsu Laboratories Ltd.

Cold storage devices such as tape and optical discs are a good solution for reducing the total cost of owner- ship for storing data. However, there is a drawback in that media and drives are separated, and placing me- dia into drives when accessing data needs a few min- utes of time. Though placing correlated data together in the same medium reduces media exchange, multi- dimensional searches disrupt it. We propose two ap- proaches which replicate data and place them in different layout for solving the problem. By concentrating on rela- tive latency reduction or utilizing replicas originally gen- erated for avoiding data loss, our method achieves high latency reduction with restricted capacity efficiency loss. A simulation result shows 31% average relative latency reduction with capacity efficiency remaining at 91%.

Available Media

Deduplicating Compressed Contents in Cloud Storage Environment

Zhichao Yan and Hong Jiang, The University of Texas at Arlington; Yujuan Tan, Chongqing University; Hao Luo, University of Nebraska—Lincoln

Data compression and deduplication are two common approaches to increasing storage efficiency in the cloud environment. Both users and cloud service providers have economic incentives to compress their data before storing it in the cloud. However, our analysis indicates that compressed packages of different data and differ- ently compressed packages of the same data are usual- ly fundamentally different from one another even when they share a large amount of redundant data. Existing data deduplication systems cannot detect redundant data among them. We propose the X-Ray Dedup approach to extract from these packages the unique metadata, such as the “checksum” and “file length” information, and use it as the compressed file’s content signature to help detect and remove file level data redundancy. X-Ray Dedup is shown by our evaluations to be capable of breaking in the boundaries of compressed packages and significantly reducing compressed packages’ size requirements, thus further optimizing storage space in the cloud.

Available Media

Non-volatile Memory through Customized Key-value Stores

Leonardo Mármol, Jorge Guerra, and Marcos K. Aguilera, VMware

Non-volatile memory, or NVM, is coming. Several technologies are maturing (FeRAM, ReRAM, PCM, DWM, FJG RAM), and soon we expect products from Intel, Micron, HP, SanDisk, and/or Samsung. Some of these products promise memory density close to flash and performance within a reasonable factor of DRAM. This technology could substantially improve the performance of software systems, especially storage systems.

Unfortunately, using NVM is hard: each technology has its quirks, and the details of products are not yet available. We need a way to integrate NVM into our software systems, without full knowledge of all the NVM product details and without having to redesign every software system for each forthcoming NVM technology.

We advocate the use of customized key-value stores. Rather than programming directly on NVM, developers (1) design a key-value store customized for the application, (2) implement the key-value store for the target NVM technology, and (3) program the application using the key-value store. When new NVM products emerge, with similar performance characteristics but different access mechanisms, developers need only modify the keyvalue store implementation, which is simpler, faster, and cheaper than redesigning the application. Thus, the keyvalue store serves as a middle layer that hides the details of the NVM technology, while providing a simple and familiar interface to the application. Customization ensures that the design is performant and simple.

Available Media
3:40 pm–4:10 pm Tuesday

Break with Refreshments

Ballroom Foyer

4:10 pm–5:25 pm Tuesday

Be More Flashy

Session Chair: Fred Douglis, EMC

Write Amplification Reduction in Flash-Based SSDs Through Extent-Based Temperature Identification

Mansour Shafaei and Peter Desnoyers, Northeastern University; Jim Fitzpatrick, SanDisk Corporation

We apply an extent-based clustering technique to the problem of identifying “hot” or frequently-written data in an SSD, allowing such data to be segregated for improved cleaning performance. We implement and evaluate this technology in simulation, using a page-mapped FTL with Greedy cleaning and separate hot and cold write frontiers. We compare it with two recently proposed hot data identification algorithms, Multiple Hash Functions and Multiple Bloom Filters, keeping the remainder of the FTL / cleaning algorithm unchanged. In almost all cases write amplification was lower with the extent-based algorithm; although in some cases the improvement was modest, in others it was as much as 20%. These gains are achieved with very small amounts of memory, e.g. roughly 10KB for the implementation tested, an important factor for SSDs where most DRAMis dedicated to address maps and data buffers.

Available Media

Improving I/O Resource Sharing of Linux Cgroup for NVMe SSDs on Multi-core Systems

Sungyong Ahn and Kwanghyun La, Samsung Electronics Co.; Jihong Kim, Seoul National University

In container-based virtualization where multiple isolat-ed containers share I/O resources on top of a single operating system, efficient and proportional I/O re-source sharing is an important system requirement. Mo-tivated by a lack of adequate support for I/O resource sharing in Linux Cgroup for high-performance NVMe SSDs, we developed a new weight-based dynamic throttling technique which can provide proportional I/O sharing for container-based virtualization solutions run-ning on NUMA multi-core systems with NVMe SSDs. By intelligently predicting the future I/O bandwidth requirement of containers based on past I/O service rates of I/O-active containers, and modifying the cur-rent Linux Cgroup implementation for better NUMA-scalable performance, our scheme achieves highly ac-curate I/O resource sharing while reducing wasted I/O bandwidth. Based on a Linux kernel 4.0.4 implementa-tion running on a 4-node NUMA multi-core systems with NVMe SSDs, our experimental results show that the proposed technique can efficiently share the I/O bandwidth of NVMe SSDs among multiple containers according to given I/O weights.

Available Media

Unblinding the OS to Optimize User-Perceived Flash SSD Latency

Woong Shin, Seoul National University; Jaehyun Park, Arizona State University; Heon Y. Yeom, Seoul National University

In this paper, we present a flash solid-state drive (SSD) optimization that provides hints of SSD internal behaviors, such as device I/O time and buffer activities, to the OS in order to mitigate the impact of I/O completion scheduling delays. The hints enable the OS to make reliable latency predictions of each I/O request so that the OS can make accurate scheduling decisions when to yield or block (busy wait) the CPU, ultimately improving user-perceived I/O performance. This was achieved by implementing latency predictors supported with an SSD I/O behavior tracker within the SSD that tracks I/O behavior at the level of internal resources, such as DRAM buffers or NAND chips. Evaluations with an SSD prototype based on a Xilinx Zynq-7000 FPGA and MLC flash chips showed that our optimizations enabled the OS to mask the scheduling delays without severely impacting system parallelism compared to prior I/O completion methods.

Available Media