8:30 a.m.–9:00 a.m. |
Thursday |
Continental Breakfast
Market Street Foyer |
9:00 a.m.–10:30 a.m. |
Thursday |
Program Chair: Ajay Gulati, VMware
Peter Godman, CEO, Qumulo While the world's stored data is growing exponentially—doubling roughly every two years—hardware grows exponentially cheaper. Long term up-front storage acquisition practices create prodigious idle capacity purchased at premium prices. Scale-out storage promises to address this problem by turning static, monolithic systems into growing, distributed, systems. Scale-out must clear many hurdles to become a seamless replacement for all monolithic storage, and peta-scale scale-out storage systems create new data manageability challenges. While the world's stored data is growing exponentially—doubling roughly every two years—hardware grows exponentially cheaper. Long term up-front storage acquisition practices create prodigious idle capacity purchased at premium prices. Scale-out storage promises to address this problem by turning static, monolithic systems into growing, distributed, systems. Scale-out must clear many hurdles to become a seamless replacement for all monolithic storage, and peta-scale scale-out storage systems create new data manageability challenges.
Peter Godman brings 20 years of industry systems experience to Qumulo. As VP of Engineering and CEO of Corensic, Peter brought the world's first thin-hypervisor based product to market. As Director of Software Engineering at Isilon, he led development of several major releases of Isilon's award-winning OneFS distributed filesystem and was inventor of 18 patented technologies. Peter studied math and computer science at MIT.
|
10:30 a.m.–11:00 a.m. |
Thursday |
Break with Refreshments
Market Street Foyer |
11:00 a.m.–12:30 p.m. |
Thursday |
Session Chair: Binny Gill, Nutanix
Lanyue Lu, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison File systems do not properly isolate faults that occur within them. As a result, a single fault may affect multiple clients adversely, making the entire file system unavailable. We introduce a new file system abstraction, called file pod, to allow applications tomanage failure and recovery polices explicitly for a group of files. Based on this abstraction, we propose the isolation file system, which provides fine-grained fault isolation and quick recovery.
Yupu Zhang, Charlotte Dragga, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau, University of Wisconsin—Madison Cloud-based file synchronization services, such as Dropbox, have never been more popular. They provide excellent reliability and durability in their server-side storage, and can provide a consistent view of their synchronized files across multiple clients. However, the loose coupling of these services and the local file system may, in some cases, turn these benefits into drawbacks. In this paper, we show that these services can silently propagate both local data corruption and the results of inconsistent crash recovery, and cannot guarantee that the data they store reflects the actual state of the disk. We propose techniques to prevent and recover from these problems by reducing the separation between local file systems and synchronization clients, providing clients with deeper knowledge of file system activity and allowing the file system to take advantage of the correct data stored remotely.
Nitin Agrawal, Akshat Aranya, and Cristian Ungureanu, NEC Labs America Mobile applications are becoming increasingly datacentric — often relying on cloud services to store, share, and analyze data. App developers have to frequently manage the local storage on the device (e.g., SQLite databases, file systems), as well as data synchronization with cloud services. Developers have to address common issues such as data packaging, handling network failures, supporting disconnected operations, propagating changes, and detecting and resolving conflicts. To free mobile developers from this burden, we are building Simba , a platform to rapidly develop and deploy datacentric mobile apps. Simba provides a unified storage and synchronization API for both structured data and unstructured objects. Apps can specify a data model spanning tables and objects, and atomically sync such data with the cloud without worrying about network disruptions. Simba is also frugal in consuming network resources.
|
12:30 p.m.–2:00 p.m. |
Thursday |
FCW Luncheon
Market Street Foyer |
2:00 p.m.–3:30 p.m. |
Thursday |
Session Chair: Steven Swanson, University of California, San Diego
Jaeyong Jeong, Sangwook Shane Hahn, Sungjin Lee, and Jihong Kim, Seoul National University We propose a new approach, called dynamic program and erase scaling (DPES), for improving the endurance of NAND flash memory. The DPES approach is based on our key finding that the NAND endurance is dependent on the erase voltage as well as the number of P/E cycles. Since the NAND endurance has a near-linear dependence on the erase voltage, lowering the erase voltage is an effective way of improving the NAND endurance. By modifying NAND chips to support multiple write modes with different erase voltages, DPES enables a flash software to exploit the new tradeoff between the NAND endurance and write speed. In this paper, we present a novel NAND endurance model which accurately captures the tradeoff relationship between the endurance and write speed under dynamic program and erase scaling. Based on our NAND endurance model, we have implemented the first DPES-aware FTL, called autoFTL, which improves the NAND endurance with a negligible degradation in the overall write throughput. Our experimental results using various I/O traces show that autoFTL can improve the maximum number of P/E cycles by 45% over an existing DPES-unaware FTL with less than 0.2% decrease in the overall write throughput.
Dong In Shin, Taejin Infotech; Young Jin Yu and Hyeong S. Kim, Seoul National University; Jae Woo Choi and Do Yung Jung, Taejin Infotech; Heon Y. Yeom, Seoul National University Emerging non-volatile memory technologies as a disk drive replacement raise some issues of software stack and interfaces, which have not been considered in disk-based storage systems. In this work, we present new cooperative schemes including software and hardware to address performance issues with deploying storage-class memory technologies as a storage device. First, we propose a new polling scheme called dynamic interval polling to avoid the unnecessary polls and reduce the burden on storage system bus. Second, we propose a pipelined execution between storage device and host OS called pipelined post I/O processing. By extending vendor-specific I/O interfaces between software and hardware, we can improve the responsiveness of I/O requests with no sacrifice of throughput.
Peter Desnoyers, Northeastern University Flash memory has been an active topic of research in recent years, but hard information about the parameters and behavior of both flash chips and SSDs has been difficult to obtain for those outside of the industry. In this paper several misconceptions found in the literature are addressed, in order to enable future researchers to avoid some of the errors found in prior work.
We examine the following topics: flash device parameters such as page and erase block size, speed, and reliability, as well as flash translation layer (FTL) requirements and behavior under random and sequential I/O. We have endeavored to find public sources for our claims, and provide experimental evidence in several cases. In doing so, we provide previously unpublished results showing the viability of random writes on commodity SSDs when restricted to a sufficiently small portion of the logical address space.
|
3:30 p.m.–4:00 p.m. |
Thursday |
Break with Refreshments
Market Street Foyer |
4:00 p.m.–5:30 p.m. |
Thursday |
Session Chair: Peter Desnoyers, Northeastern University
Sangwhan Moon and A. L. Narasimha Reddy, Texas A&M University Parity protection at system level is typically employed to compose reliable storage systems. However, careful consideration is required when SSD based systems employ parity protection. First, additional writes are required for parity updates. Second, parity consumes space on the device, which results in write amplification from less efficient garbage collection at higher space utilization.
This paper analyzes the effectiveness of SSD based RAID and discusses the potential benefits and drawbacks in terms of reliability. A Markov model is presented to estimate the lifetime of SSD based RAID systems in different environments. In a single array, our preliminary results show that parity protection provides benefit only with considerably low space utilizations and low data access rates. However, in a large system, RAID improves data lifetime even when we take write amplification into account.
K. V. Rashmi and Nihar B. Shah, University of California, Berkeley; Dikang Gu, Hairong Kuang, and Dhruba Borthakur, Facebook; Kannan Ramchandran, University of California, Berkeley Erasure codes, such as Reed-Solomon (RS) codes, are being increasingly employed in data centers to combat the cost of reliably storing large amounts of data. Although these codes provide optimal storage efficiency, they require significantly high network and disk usage during recovery of missing data.
In this paper, we first present a study on the impact of recovery operations of erasure-coded data on the data-center network, based on measurements from Facebook’s warehouse cluster in production. To the best of our knowledge, this is the first study of its kind available in the literature. Our study reveals that recovery of RS-coded data results in a significant increase in network traffic, more than a hundred terabytes per day, in a cluster storing multiple petabytes of RS-coded data.
To address this issue, we present a new storage code using our recently proposed Piggybacking framework, that reduces the network and disk usage during recovery by 30% in theory, while also being storage optimal and supporting arbitrary design parameters. The implementation of the proposed code in the Hadoop Distributed File System (HDFS) is underway. We use the measurements from the warehouse cluster to show that the proposed code would lead to a reduction of close to fifty terabytes of cross-rack traffic per day.
Ming-Shing Chen, National Taiwan University; Bo-Yin Yang, Academia Sinica; Chen-Mou Cheng, National Taiwan University As disk manufacturers compete to build ever larger and cheaper disks, the possibility of RAID failures becomes more significant for larger and larger disk arrays, creating opportunities for products beyond RAID 6. In this paper, we present the design and implementation of RAIDq, a software-friendly, multiple-parity RAID. RAIDq uses a linear code with efficient encoding and decoding algorithms and addresses a wide range of general cases of RAID that are of practical interest. However, RAIDq does have a limit on how many data disks it can support, which we will analyze in this paper. A second benefit of RAIDq is that it includes existing RAID 5 and 6 as special cases and hence is 100% backward compatible. This allows RAIDq to reuse the efficient coding algorithms and implementations of RAID 5 and 6. Last but not least, RAIDq is optimized for software implementation, as its encoding only involves simple XOR and multiplication by several fixed elements in a finite field. Thanks to the popularity of RAID 6, such operations have been highly optimized on modern processors, of which RAIDq can take advantage, as corroborated by our experiment results.
|
6:30 p.m.–8:00 p.m. |
Thursday |
|