{FDIO}: A Feedback Driven Controller for Minimizing Energy in {I/O-Intensive} Applications

Ioannis Manousakis; Manolis Marazakis; Angelos Bilas; Gregory R. Ganger; Geoff Kuenning; Erez Zadok

HotStorage '13 Workshop Program

All sessions will be held in the Crystal Room unless otherwise noted.

The full papers published by USENIX for the workshop are available as a download or individually below to workshop registrants immediately and to everyone beginning June 27, 2013. Everyone can view the abstracts immediately. Copyright to the individual works is retained by the author[s].

Download Paper Archives

Attendee Files

HotStorage '13 Papers ZIP

Thursday, June 27, 2013

8:30 a.m.–9:00 a.m.	Thursday
Continental Breakfast Market Street Foyer
9:00 a.m.–10:30 a.m.	Thursday
Keynote Address Program Chair: Ajay Gulati, VMware E Pluribus Unum: The Promise, Limits, and Opportunities of Scale-out Storage Peter Godman, CEO, Qumulo While the world's stored data is growing exponentially—doubling roughly every two years—hardware grows exponentially cheaper. Long term up-front storage acquisition practices create prodigious idle capacity purchased at premium prices. Scale-out storage promises to address this problem by turning static, monolithic systems into growing, distributed, systems. Scale-out must clear many hurdles to become a seamless replacement for all monolithic storage, and peta-scale scale-out storage systems create new data manageability challenges. While the world's stored data is growing exponentially—doubling roughly every two years—hardware grows exponentially cheaper. Long term up-front storage acquisition practices create prodigious idle capacity purchased at premium prices. Scale-out storage promises to address this problem by turning static, monolithic systems into growing, distributed, systems. Scale-out must clear many hurdles to become a seamless replacement for all monolithic storage, and peta-scale scale-out storage systems create new data manageability challenges. Peter Godman brings 20 years of industry systems experience to Qumulo. As VP of Engineering and CEO of Corensic, Peter brought the world's first thin-hypervisor based product to market. As Director of Software Engineering at Isilon, he led development of several major releases of Isilon's award-winning OneFS distributed filesystem and was inventor of 18 patented technologies. Peter studied math and computer science at MIT. Available Media Read more about E Pluribus Unum: The Promise, Limits, and Opportunities of Scale-out Storage
10:30 a.m.–11:00 a.m.	Thursday
Break with Refreshments Market Street Foyer
11:00 a.m.–12:30 p.m.	Thursday
Filesystems Everywhere Session Chair: Binny Gill, Nutanix Fault Isolation and Quick Recovery in Isolation File Systems Lanyue Lu, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison File systems do not properly isolate faults that occur within them. As a result, a single fault may affect multiple clients adversely, making the entire file system unavailable. We introduce a new file system abstraction, called file pod, to allow applications tomanage failure and recovery polices explicitly for a group of files. Based on this abstraction, we propose the isolation file system, which provides fine-grained fault isolation and quick recovery. Available Media -Box: Towards Reliability and Consistency in Dropbox-like File Synchronization Services Yupu Zhang, Charlotte Dragga, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau, University of Wisconsin—Madison* Cloud-based file synchronization services, such as Dropbox, have never been more popular. They provide excellent reliability and durability in their server-side storage, and can provide a consistent view of their synchronized files across multiple clients. However, the loose coupling of these services and the local file system may, in some cases, turn these benefits into drawbacks. In this paper, we show that these services can silently propagate both local data corruption and the results of inconsistent crash recovery, and cannot guarantee that the data they store reflects the actual state of the disk. We propose techniques to prevent and recover from these problems by reducing the separation between local file systems and synchronization clients, providing clients with deeper knowledge of file system activity and allowing the file system to take advantage of the correct data stored remotely. Available Media Mobile Data Sync in a Blink Nitin Agrawal, Akshat Aranya, and Cristian Ungureanu, NEC Labs America Mobile applications are becoming increasingly datacentric — often relying on cloud services to store, share, and analyze data. App developers have to frequently manage the local storage on the device (e.g., SQLite databases, file systems), as well as data synchronization with cloud services. Developers have to address common issues such as data packaging, handling network failures, supporting disconnected operations, propagating changes, and detecting and resolving conflicts. To free mobile developers from this burden, we are building Simba , a platform to rapidly develop and deploy datacentric mobile apps. Simba provides a unified storage and synchronization API for both structured data and unstructured objects. Apps can specify a data model spanning tables and objects, and atomically sync such data with the cloud without worrying about network disruptions. Simba is also frugal in consuming network resources. Available Media
12:30 p.m.–2:00 p.m.	Thursday
FCW Luncheon Market Street Foyer
2:00 p.m.–3:30 p.m.	Thursday
Everything About NAND Session Chair: Steven Swanson, University of California, San Diego Improving NAND Endurance by Dynamic Program and Erase Scaling Jaeyong Jeong, Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim, Seoul National University We propose a new approach, called dynamic program and erase scaling (DPES), for improving the endurance of NAND ﬂash memory. The DPES approach is based on our key finding that the NAND endurance is dependent on the erase voltage as well as the number of P/E cycles. Since the NAND endurance has a near-linear dependence on the erase voltage, lowering the erase voltage is an effective way of improving the NAND endurance. By modifying NAND chips to support multiple write modes with different erase voltages, DPES enables a ﬂash software to exploit the new tradeoff between the NAND endurance and write speed. In this paper, we present a novel NAND endurance model which accurately captures the tradeoff relationship between the endurance and write speed under dynamic program and erase scaling. Based on our NAND endurance model, we have implemented the ﬁrst DPES-aware FTL, called autoFTL, which improves the NAND endurance with a negligible degradation in the overall write throughput. Our experimental results using various I/O traces show that autoFTL can improve the maximum number of P/E cycles by 45% over an existing DPES-unaware FTL with less than 0.2% decrease in the overall write throughput. Available Media Dynamic Interval Polling and Pipelined Post I/O Processing for Low-Latency Storage Class Memory Dong In Shin, Taejin Infotech; Young Jin Yu and Hyeong S. Kim, Seoul National University; Jae Woo Choi and Do Yung Jung, Taejin Infotech; Heon Y. Yeom, Seoul National University Emerging non-volatile memory technologies as a disk drive replacement raise some issues of software stack and interfaces, which have not been considered in disk-based storage systems. In this work, we present new cooperative schemes including software and hardware to address performance issues with deploying storage-class memory technologies as a storage device. First, we propose a new polling scheme called dynamic interval polling to avoid the unnecessary polls and reduce the burden on storage system bus. Second, we propose a pipelined execution between storage device and host OS called pipelined post I/O processing. By extending vendor-specific I/O interfaces between software and hardware, we can improve the responsiveness of I/O requests with no sacrifice of throughput. Available Media What Systems Researchers Need to Know about NAND Flash Peter Desnoyers, Northeastern University Flash memory has been an active topic of research in recent years, but hard information about the parameters and behavior of both flash chips and SSDs has been difficult to obtain for those outside of the industry. In this paper several misconceptions found in the literature are addressed, in order to enable future researchers to avoid some of the errors found in prior work. We examine the following topics: flash device parameters such as page and erase block size, speed, and reliability, as well as flash translation layer (FTL) requirements and behavior under random and sequential I/O. We have endeavored to find public sources for our claims, and provide experimental evidence in several cases. In doing so, we provide previously unpublished results showing the viability of random writes on commodity SSDs when restricted to a sufficiently small portion of the logical address space. Available Media
3:30 p.m.–4:00 p.m.	Thursday
Break with Refreshments Market Street Foyer
4:00 p.m.–5:30 p.m.	Thursday
RAID Parade Session Chair: Peter Desnoyers, Northeastern University Don’t Let RAID Raid the Lifetime of Your SSD Array Sangwhan Moon and A. L. Narasimha Reddy, Texas A&M University Parity protection at system level is typically employed to compose reliable storage systems. However, careful consideration is required when SSD based systems employ parity protection. First, additional writes are required for parity updates. Second, parity consumes space on the device, which results in write amplification from less efficient garbage collection at higher space utilization. This paper analyzes the effectiveness of SSD based RAID and discusses the potential benefits and drawbacks in terms of reliability. A Markov model is presented to estimate the lifetime of SSD based RAID systems in different environments. In a single array, our preliminary results show that parity protection provides benefit only with considerably low space utilizations and low data access rates. However, in a large system, RAID improves data lifetime even when we take write amplification into account. Available Media A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster K. V. Rashmi and Nihar B. Shah, University of California, Berkeley; Dikang Gu, Hairong Kuang, and Dhruba Borthakur, Facebook; Kannan Ramchandran, University of California, Berkeley Erasure codes, such as Reed-Solomon (RS) codes, are being increasingly employed in data centers to combat the cost of reliably storing large amounts of data. Although these codes provide optimal storage efficiency, they require significantly high network and disk usage during recovery of missing data. In this paper, we first present a study on the impact of recovery operations of erasure-coded data on the data-center network, based on measurements from Facebook’s warehouse cluster in production. To the best of our knowledge, this is the first study of its kind available in the literature. Our study reveals that recovery of RS-coded data results in a significant increase in network traffic, more than a hundred terabytes per day, in a cluster storing multiple petabytes of RS-coded data. To address this issue, we present a new storage code using our recently proposed Piggybacking framework, that reduces the network and disk usage during recovery by 30% in theory, while also being storage optimal and supporting arbitrary design parameters. The implementation of the proposed code in the Hadoop Distributed File System (HDFS) is underway. We use the measurements from the warehouse cluster to show that the proposed code would lead to a reduction of close to fifty terabytes of cross-rack traffic per day. Available Media RAIDq: A Software-friendly, Multiple-parity RAID Ming-Shing Chen, National Taiwan University; Bo-Yin Yang, Academia Sinica; Chen-Mou Cheng, National Taiwan University As disk manufacturers compete to build ever larger and cheaper disks, the possibility of RAID failures becomes more significant for larger and larger disk arrays, creating opportunities for products beyond RAID 6. In this paper, we present the design and implementation of RAIDq, a software-friendly, multiple-parity RAID. RAIDq uses a linear code with efficient encoding and decoding algorithms and addresses a wide range of general cases of RAID that are of practical interest. However, RAIDq does have a limit on how many data disks it can support, which we will analyze in this paper. A second benefit of RAIDq is that it includes existing RAID 5 and 6 as special cases and hence is 100% backward compatible. This allows RAIDq to reuse the efficient coding algorithms and implementations of RAID 5 and 6. Last but not least, RAIDq is optimized for software implementation, as its encoding only involves simple XOR and multiplication by several fixed elements in a finite field. Thanks to the popularity of RAID 6, such operations have been highly optimized on modern processors, of which RAIDq can take advantage, as corroborated by our experiment results. Available Media
6:30 p.m.–8:00 p.m.	Thursday
Poster Session and Happy Hour Market Street Foyer

Friday, June 28, 2013

8:30 a.m.–9:00 a.m.	Friday
Continental Breakfast Market Street Foyer
9:00 a.m.–10:30 a.m.	Friday
Panel Discussion Software-defined Storage Panelists: Bill Earl, VMware; David Black, EMC; Jay Kistler, Maginatics; Robert Novak, Nexenta; Umesh Maheshwari, NimbleStorage In this panel, we will discuss the very controversial and timely topic of software-defined storage. What does it mean? How do we get it? Do we already have it? Given the adoption of virtualization and the recent advent of software-defined networking, many people believe that the remaining technology needed to construct a truly agile datacenter is software-defined storage. However, every vendor has its own definition of what it means and how to construct it. Some even claim to already have it. In this panel discussion we hope to shed some light on this topic and elaborate on the requirements and challenges to construct a storage layer for the next generation of datacenters. In this panel, we will discuss the very controversial and timely topic of software-defined storage. What does it mean? How do we get it? Do we already have it? Given the adoption of virtualization and the recent advent of software-defined networking, many people believe that the remaining technology needed to construct a truly agile datacenter is software-defined storage. However, every vendor has its own definition of what it means and how to construct it. Some even claim to already have it. In this panel discussion we hope to shed some light on this topic and elaborate on the requirements and challenges to construct a storage layer for the next generation of datacenters. Available Media Read more about Software-defined Storage
10:30 a.m.–11:00 a.m.	Friday
Break with Refreshments Market Street Foyer
11:00 a.m.–12:30 p.m.	Friday
Virtual Machine Data Session Chair: Windsor Hsu, EMC Efﬁciently Storing Virtual Machine Backups Stephen Smaldone, Grant Wallace, and Windsor Hsu, EMC Corporation Physical level backups offer increased performance in terms of throughput and scalability as compared to logical backup models, while still maintaining logical consistency. As the trend toward virtualization grows, virtual machine backups (a form of physical backup) are even more important, while becoming easier to perform. The downside is that physical backup generally requires more storage, because of file system meta-data and unallocated blocks. Deduplication is becoming widely accepted and many believe that it will favor logical backup, but this has not been well studied and the relative cost of physical vs. logical on deduplicating storage is not known. In this paper, we take a data-driven approach using user data to quantify the storage costs and contributing factors of physical backups over numerous generations. Based on our analysis, we show how physical backups can be as storage efficient as logical backups, while also giving good backup performance. Available Media Improving I/O Performance Using Virtual Disk Introspection Vasily Tarasov and Deepak Jain, Stony Brook University; Dean Hildebrand and Renu Tewari, IBM Research—Almaden; Geoff Kuenning, Harvey Mudd College; Erez Zadok, Stony Brook University Storage consolidation due to server virtualization puts stringent new requirements on Storage Array (SA) performance. Virtualized workloads require new performance optimizations that cannot be totally addressed by merely using expensive hardware such as SSDs. This position paper presents Virtual Machine Disk Image (VMDI) introspection—a key technique for implementing a variety of virtualization-oriented I/O optimizations. VMDI introspection gives SAs an understanding of guest file system semantics, such as determining object types and classifying read and write operations. We explore possible approaches for VMDI introspection and then describe a set of VMDI-introspection-based optimizations. Our prototype implementation with enhanced meta-data caching and placement shows 11% to 20 performance improvement. Available Media Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, and Gautham Narayanasamy, University of California, Santa Barbara; Hong Tang, Alibaba Inc. In a virtualized cloud cluster, frequent snapshot backup of virtual disks improves hosting reliability; however, it takes significant memory resource to detect and remove duplicated content blocks among snapshots. This paper presents a low-cost deduplication solution scalable for a large number of virtual machines. The key idea is to separate duplicate detection from the actual storage backup instead of using inline deduplication, and partition global index and detection requests among machines using fingerprint values. Then each machine conducts duplicate detection partition by partition independently with minimal memory usage. Another optimization is to allocate and control buffer space for exchanging detection requests and duplicate summaries among machines. Our evaluation shows that the proposed multi-stage scheme uses a small amount of memory while delivering a satisfactory backup throughput. Available Media
12:30 p.m.–2:00 p.m.	Friday
FCW Luncheon Market Street Foyer
2:00 p.m.–4:00 p.m.	Friday
Storage Performance and Energy Session Chair: Dan R. K. Ports, University of Washington Challenges in Getting Flash Drives Closer to CPU Myoungsoo Jung and Mahmut Kandemir, The Pennsylvania State University The PCI Express Solid State Disks (PCIe SSDs) blur the difference between block and memory access semantic devices. Since these SSDs leverage PCIe bus as storage interface, their interfaces are different from conventional memory system interconnects as well as thin storage interfaces. This leads to a new SSD architecture and storage software stack design. Unfortunately, there are not many studies focusing on the system characteristics of these emerging PCIe SSD platforms. In this paper, we quantitatively analyze the challenges faced by PCIe SSDs in getting flash memory closer to CPU and study two representative PCIe SSD architectures (from-scratch SSD and bridge-based SSD) using state-of-the-art real SSDs from two different vendors. Our experimental analysis reveals that 1) while the from-scratch SSD approach offers remarkable performance improvements, it requires enormous host-side memory and computation resources which may not be acceptable in many computing systems; 2) the performance of the from-scratch SSD significantly degrades in a multi-core system; 3) redundant flash software and controllers should be eliminated from the bridge-based SSD architecture; and 4) latency of PCIe SSDs significantly degrade with their storage-level queueing mechanism. Finally, we discuss system implications including potential PCIe SSD applications such as all-flash array. Available Media Runtime I/O Re-Routing + Throttling on HPC Storage Qing Liu, Norbert Podhorszki, Jeremy Logan, and Scott Klasky, Oak Ridge National Laboratory Massively parallel storage systems are becoming more and more prevalent on HPC systems due to the emergence of a new generation of data-intensive applications. To achieve the level of I/O throughput and capacity that is demanded by data intensive applications, storage systems typically deploy a large number of storage devices (also known as LUNs or data stores). In doing so, parallel applications are allowed to access storage concurrently, and as a result, the aggregate I/O throughput can be linearly increased with the number of storage devices, reducing the application’s end-to-end time. For a production system where storage devices are shared between multiple applications, contention is often a major problem leading to a significant reduction in I/O throughput. In this paper, we describe our efforts to resolve this issue in the context of HPC using a balanced re-routing + throttling approach. The proposed scheme re-routes I/O requests to a less congested storage location in a controlled manner so that write performance is improved while limiting the impact on read. Available Media Specialized Storage for Big Numeric Time Series Ilari Shafer, Raja R. Sambasivan, Anthony Rowe, and Gregory R. Ganger, Carnegie Mellon University Numeric time series data has unique storage requirements and access patterns that can benefit from specialized support, given its importance in Big Data analyses. Popular frameworks and databases focus on addressing other needs, making them a suboptimal fit. This paper describes the support needed for numeric time series, suggests an architecture for efficient time series storage, and illustrates its potential for satisfying key requirements. Available Media FDIO: A Feedback Driven Controller for Minimizing Energy in I/O-Intensive Applications Ioannis Manousakis, Manolis Marazakis, and Angelos Bilas, Foundation for Research and Technology - Hellas (FORTH) The relatively low utilization of servers in data-center environments when running I/O-intensive applications is a key concern for efficiency. Energy optimization, by throttling power consumption, is an essential operational goal. Since processors are the most demanding of the components constituting a server, energy optimization has focused on regulating processor consumption. However, more recently memory and storage are increasingly becoming more demanding, collectively accounting for more than 40% of the overall energy consumption in typical system configurations. We argue that this trend necessitates tracking overall energy consumption rather than focusing on any single component. Although currently only processors expose energy-related controls at a fine granularity, we demonstrate that with a more holistic approach we can obtain significant efficiency benefits. Specifically, our feedback-based controller for Linux detects I/O-intensive phases in workloads, and adjusts processor operating frequencies accordingly, in a more effective manner than the standard CPU governors. Available Media

HotStorage '13 Workshop Program

Thursday, June 27, 2013

Continental Breakfast

Break with Refreshments

FCW Luncheon

Break with Refreshments

Friday, June 28, 2013

Continental Breakfast

Break with Refreshments

FCW Luncheon