9:00 am–10:15 am |
Wednesday |
Session Chair: Brent Welch, Google
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison We present WiscKey, a persistent LSM-tree-based key-value store with a performance-oriented data layout that separates keys from values to minimize I/O amplification. The design of WiscKey is highly SSD optimized,
leveraging both the sequential and random performance characteristics of the device. We demonstrate the advantages of WiscKey with both microbenchmarks and YCSB workloads. Microbenchmark results show that
WiscKey is 2.5x–111x faster than LevelDB for loading a database and 1.6x–14x faster for random lookups. WiscKey is faster than both LevelDB and RocksDB in all six YCSB workloads.
Hyeontaek Lim and David G. Andersen, Carnegie Mellon University; Michael Kaminsky, Intel Labs Multi-stage log-structured (MSLS) designs, such as LevelDB, RocksDB, HBase, and Cassandra, are a family of storage system designs that exploit the high sequential write speeds of hard disks and flash drives by using multiple append-only data structures. As a first step towards accurate and fast evaluation of MSLS, we propose new analytic primitives and MSLS design models that quickly give accurate performance estimates. Our model can almost perfectly estimate the cost of inserts in LevelDB, whereas the conventional worst-case analysis gives 1.8–3.5x higher estimates than the actual cost. A few minutes of offline analysis using our model can find optimized system parameters that decrease LevelDB’s insert cost by up to 9.4–26.2%; our analytic primitives and model also suggest changes to RocksDB that reduce its insert cost by up to 32.0%, without reducing query performance or requiring extra memory.
Heng Zhang, Mingkai Dong, and Haibo Chen, Shanghai Jiao Tong University In-memory key/value store (KV-store) is a key building block for many systems like databases and large websites. Two key requirements for such systems are efficiency and availability, which demand a KV-store to continuously
handle millions of requests per second. A common approach to availability is using replication such as primary-backup (PBR), which, however, requires M+1 times memory to tolerate M failures. This renders scarce memory unable to handle useful user jobs.
This paper makes the first case of building highly available in-memory KV-store by integrating erasure coding to achieve memory efficiency, while not notably degrading performance. A main challenge is that an in-memory
KV-store has much scattered metadata. A single KV put may cause excessive coding operations and parity updates due to numerous small updates to metadata. Our approach, namely Cocytus, addresses this challenge by using a hybrid scheme that leverages PBR for small-sized and scattered data (e.g., metadata and key), while only applying erasure coding to relatively large data (e.g., value). To mitigate well-known issues like lengthy recovery of erasure coding, Cocytus uses an online recovery scheme by leveraging the replicated metadata information to continuously serving KV requests. We have applied Cocytus to Memcached. Evaluation using YCSB with different KV configurations shows that Cocytus incurs low overhead for latency and throughput, can tolerate node failures with fast online recovery, yet saves 33% to 46% memory compared to PBR when tolerating two failures.
|
Grand Ballroom DE Download tutorial materials from the SNIA Web site.
Michael Ault, Oracle Guru, IBM, Inc.
9:00 am–9:45 am IDC has released a document on testing all-flash-arrays (AFA) to provide a common framework for judging AFAs from various manufacturers. This parpa provides procedures scripts and examples to perform the IDC test framework utilizing the free tool VDBench on AFAs to provide a common set of results for comparison of multiple AFAs suitability for cloud or other network based storage.
IDC has released a document on testing all-flash-arrays (AFA) to provide a common framework for judging AFAs from various manufacturers. This parpa provides procedures scripts and examples to perform the IDC test framework utilizing the free tool VDBench on AFAs to provide a common set of results for comparison of multiple AFAs suitability for cloud or other network based storage.
Learning Objectives:
- Undertand the requirements of IDC testing
- Provide guidelines and scripts for use with VDBench for IDC tests
- Demonstrate a Framework for evaluating multiple AFAs using IDC guidelines
MIke Ault has worked with computers since 1979 and with Oracle databases since 1990. Mike has spent the last eight years working with Flash storage in relation to Oracle and other database storage needs. Mike is a frequent presenter at user conferences and has written over 24 Oracle-related books. Mike currently works as an Oracle expert for the STG flash group at IBM, Inc.
Carl Waldspurger, Research and Development, CloudPhysics, Inc., and Irfan Ahmad, CTO, CloudPhysics, Inc.
9:45 am–10:30 am The benefits of storage caches are notoriously difficult to model and control, varying widely by workload, and exhibiting complex, nonlinear behaviors. However, recent advances make it possible to analyze and optimize high-performance storage caches using lightweight, continuously-updated miss ratio curves (MRCs). Previously relegated to offline modeling, MRCs can now be computed so inexpensively that they are practical for dynamic, online cache management, even in the most demanding environments.
The benefits of storage caches are notoriously difficult to model and control, varying widely by workload, and exhibiting complex, nonlinear behaviors. However, recent advances make it possible to analyze and optimize high-performance storage caches using lightweight, continuously-updated miss ratio curves (MRCs). Previously relegated to offline modeling, MRCs can now be computed so inexpensively that they are practical for dynamic, online cache management, even in the most demanding environments.
After reviewing the history and evolution of MRC algorithms, we will examine new opportunities afforded by recent techniques. MRCs capture valuable information about locality that can be leveraged to guide efficient cache sizing, allocation, and partitioning, in order to support diverse goals such as improving performance, isolation, and quality of service. We will also describe how multiple MRCs can be used to track different alternatives at various timescales, enabling online tuning of cache parameters and policies.
Learning Objectives:
- Storage cache modeling and analysis
- Efficient cache sizing, allocation, and partitioning
- Online tuning of commercial storage cache parameters and policies
Carl Waldspurger has been leading research at CloudPhysics since its inception. He is active in the systems research community, and serves as a technical advisor to several startups. For over a decade, Carl was responsible for core resource management and virtualization technologies at VMware. Prior to VMware, he was a researcher at the DEC Systems Research Center. Carl holds a Ph.D. in computer science from MIT.
Irfan Ahmad is the Chief Technology Officer of CloudPhysics, which he cofounded in 2011. Prior to CloudPhysics, Irfan was at VMware, where he was R&D tech lead for the DRS team and co-inventor for flagship products, including Storage DRS and Storage I/O Control. Irfan worked extensively on interdisciplinary endeavors in memory, storage, CPU, and distributed resource management, and developed a special interest in research at the intersection of systems. Irfan also spent several years in performance analysis and optimization, both in systems software and OS kernels. Before VMware, Irfan worked on a software microprocessor at Transmeta.
|
10:15 am–10:45 am |
Wednesday |
Break with Refreshments
|
10:45 am–noon |
Wednesday |
Session Chair: Theodore M. Wong, Human Longevity, Inc.
Tyler Harter, University of Wisconsin—Madison; Brandon Salmon and Rose Liu, Tintri; Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau, University of Wisconsin—Madison Containerized applications are becoming increasingly popular, but unfortunately, current container-deployment methods are very slow. We develop a new container benchmark, HelloBench, to evaluate the startup times of 57 different containerized applications. We use HelloBench to analyze workloads in detail, studying the block I/O patterns exhibited during startup and compressibility of container images. Our analysis shows that pulling packages accounts for 76% of container start time, but only 6.4% of that data is read. We use this and other findings to guide the design of Slacker, a new Docker storage driver optimized for fast container startup. Slacker is based on centralized storage that is shared between all Docker workers and registries. Workers quickly provision container storage using backend clones and minimize startup latency by lazily fetching container data. Slacker speeds up the median container development cycle by 20x and deployment cycle by 5x.
Ioan Stefanovici and Bianca Schroeder, University of Toronto; Greg O'Shea, Microsoft Research; Eno Thereska, Confluent and Imperial College London In a data center, an IO from an application to distributed storage traverses not only the network, but also several software stages with diverse functionality. This set of ordered stages is known as the storage or IO stack. Stages
include caches, hypervisors, IO schedulers, file systems, and device drivers. Indeed, in a typical data center, the number of these stages is often larger than the number of network hops to the destination. Yet, while packet routing
is fundamental to networks, no notion of IO routing exists on the storage stack. The path of an IO to an endpoint is predetermined and hard-coded. This forces IO with different needs (e.g., requiring different caching or replica selection) to flow through a one-size-fits-all IO stack structure, resulting in an ossified IO stack.
This paper proposes sRoute, an architecture that provides a routing abstraction for the storage stack. sRoute comprises a centralized control plane and “sSwitches” on the data plane. The control plane sets the forwarding rules in each sSwitch to route IO requests at runtime based on application-specific policies. A key strength of our architecture is that it works with unmodified applications and VMs. This paper shows significant benefits of customized IO routing to data center tenants (e.g., a factor of ten for tail IO latency, more than 60% better throughput for a customized replication protocol and a factor of two in throughput for customized caching).
Sergey Legtchenko, Xiaozhou Li, Antony Rowstron, Austin Donnelly, and Richard Black, Microsoft Research Cloud providers and companies running large-scale data centers offer near-line, cold, and archival data storage, which trade access latency and throughput performance for cost. These often require physical rack-scale storage
designs, e.g. Facebook/Open Compute Project (OCP) Cold Storage or Pelican, which co-design the hardware, mechanics, power, cooling and software to minimize costs to support the desired workload. A consequence is that the rack resources are restricted, requiring a software stack that can operate within the provided resources. The co-design makes it hard to understand the end-to-end performance impact of relatively small physical design changes and, worse, the software stacks are brittle to these changes.
Flamingo supports the design of near-line HDD-based storage racks for cloud services. It requires a physical rack design, a set of resource constraints, and some target performance characteristics. Using these Flamingo is able to automatically parameterize a generic storage stack to allow it to operate on the physical rack. It is also able to efficiently explore the performance impact of varying the rack resources. It incorporates key principles learned from the design and deployment of cold storage systems. We demonstrate that Flamingo can rapidly reduce the time taken to design custom racks to support near-line storage.
|
Tom Talpey, Architect, Microsoft
10:45 am–11:30 am The SMB protocol evolved over time from CIFS to SMB1 to SMB2, with implementations by dozens of vendors including most major Operating Systems and NAS solutions. The SMB 3.0 protocol had its first commercial implementations by Microsoft, NetApp and EMC by the end of 2012, and many other implementations exist or are in progress. The SMB3 protocol is currently at 3.1.1 and continues to advance.
The SMB protocol evolved over time from CIFS to SMB1 to SMB2, with implementations by dozens of vendors including most major Operating Systems and NAS solutions. The SMB 3.0 protocol had its first commercial implementations by Microsoft, NetApp and EMC by the end of 2012, and many other implementations exist or are in progress. The SMB3 protocol is currently at 3.1.1 and continues to advance.
This SNIA Tutorial begins by describing the history and basic architecture of the SMB protocol and its operations. The second part of the tutorial covers the various versions of the SMB protocol, with details of improvements over time. The final part covers the latest changes in SMB3, and the resources available in support of its development by industry.
Learning Objectives:
- Understand the basic architecture of the SMB protocol family
- Enumerate the main capabilities introduced with SMB 2.0/2.1
- Describe the main capabilities introduced with SMB 3.0 and beyond
Tom Talpey is an Architect in the File Server Team at Microsoft. His responsibilities include SMB 3, SMB Direct (SMB over RDMA), and all the protocols and technologies that support the SMB ecosystem. Tom has worked in the areas of network filesystems, network transports and RDMA for many years and recently has been working on storage traffic management, with application not only to SMB but in broad end-to-end scenarios. He is a frequent presenter at Storage Dev.
Mark Carlson, Principal Engineer, Industry Standards, Toshiba
11:30 am–12:15 pm A number of scale out storage solutions, as part of open source and other projects, are architected to scale out by incrementally adding and removing storage nodes. Example projects include:
- Hadoop’s HDFS
- CEPH
- Swift (OpenStack object storage)
The typical storage node architecture includes inexpensive enclosures with IP networking, CPU, Memory and Direct Attached Storage (DAS). While inexpensive to deploy, these solutions become harder to manage over time. Power and space requirements of Data Centers are difficult to meet with this type of solution. Object Drives further partition these object systems allowing storage to scale up and down by single drive increments.
A number of scale out storage solutions, as part of open source and other projects, are architected to scale out by incrementally adding and removing storage nodes. Example projects include:
- Hadoop’s HDFS
- CEPH
- Swift (OpenStack object storage)
The typical storage node architecture includes inexpensive enclosures with IP networking, CPU, Memory and Direct Attached Storage (DAS). While inexpensive to deploy, these solutions become harder to manage over time. Power and space requirements of Data Centers are difficult to meet with this type of solution. Object Drives further partition these object systems allowing storage to scale up and down by single drive increments.
This talk will discuss the current state and future prospects for object drives. Use cases and requirements will be examined and best practices will be described.
Learning Objectives:
- What are object drives?
- What value do they provide?
- Where are they best deployed?
Mark A. Carlson has more than 35 years of experience with networking and storage development and more than 18 years experience with Java technology. Mark was one of the authors of the CDMI Cloud Storage standard. He has spoken at numerous industry forums and events. He is the co-chair of the SNIA Cloud Storage and Object Drive technical working groups, and serves as vice chair on the SNIA Technical Council.
|
Noon–2:00 pm |
Wednesday |
Lunch (on your own)
|
2:00 pm–3:30 pm |
Wednesday |
Session Chair: Ethan L. Miller, University of California, Santa Cruz, and Pure Storage
Mohammed G. Khatib and Zvonimir Bandic, WDC Research Power efficiency is pressing in today’s cloud systems. Datacenter architects are responding with various strategies, including capping the power available to computing systems. Throttling bandwidth has been proposed to cap the power usage of the disk drive. This work revisits throttling and addresses its shortcomings. We show that, contrary to the common belief, the disk’s power usage does not always increase as the disk’s throughput increases. Furthermore, throttling unnecessarily sacrifices I/O response times by idling the disk. We propose a technique that resizes the queues of the disk to cap its power. Resizing queues not only imposes no delays on servicing
requests, but also enables performance differentiation.
We present the design and implementation of PCAP, an agile performance-aware power capping system for the disk drive. PCAP dynamically resizes the disk’s queues to cap power. It operates in two performanceaware
modes, throughput and tail-latency, making it viable for cloud systems with service-level differentiation. We evaluate PCAP for different workloads and disk drives. Our experiments show that PCAP reduces power by up to 22%. Further, under PCAP, 60% of the requests observe service times below 100 ms compared to just 10% under throttling. PCAP also reduces worst-case latency by 50% and increases throughput by 32% relative to throttling.
Qingshu Chen, Liang Liang, Yubin Xia, and Haibo Chen, Shanghai Jiao Tong University Copy-on-write virtual disks (e.g., qcow2 images) provide many useful features like snapshot, de-duplication, and full-disk encryption. However, our study uncovers that they introduce additional metadata for block organization
and notably more disk sync operations (e.g., more than 3X for qcow2 and 4X for VMDK images). To mitigate such sync amplification, we propose three optimizations, namely per virtual disk internal journaling, dual-mode journaling, and adaptive-preallocation, which eliminate the extra sync operations while preserving those features in a consistent way. Our evaluation shows that the three optimizations result in up to 110% performance speedup for varmail and 50% for TPCC.
Pantazis Deligiannis, Imperial College London; Matt McCutchen, Massachusetts Institute of Technology; Paul Thomson, Imperial College London; Shuo Chen, Microsoft; Alastair F. Donaldson, Imperial College London; John Erickson, Cheng Huang, Akash Lal, Rashmi Mudduluru, Shaz Qadeer, and Wolfram Schulte, Microsoft Testing distributed systems is challenging due to multiple sources of nondeterminism. Conventional testing techniques, such as unit, integration and stress testing, are ineffective in preventing serious but subtle bugs from reaching production. Formal techniques, such as TLA+, can only verify high-level specifications of systems at the level of logic-based models, and fall short of checking the actual executable code. In this paper, we present a new methodology for testing distributed systems. Our approach applies advanced systematic testing techniques to thoroughly check that the executable code adheres to its high-level specifications, which significantly improves coverage of important system behaviors.
Our methodology has been applied to three distributed storage systems in the Microsoft Azure cloud computing platform. In the process, numerous bugs were identified, reproduced, confirmed and fixed. These bugs required a subtle combination of concurrency and failures, making them extremely difficult to find with conventional testing techniques. An important advantage of our approach is that a bug is uncovered in a small setting and witnessed by a full system trace, which dramatically increases the productivity of debugging.
Mingzhe Hao, University of Chicago; Gokul Soundararajan and Deepak Kenchammana-Hosekote, NetApp, Inc.; Andrew A. Chien and Haryadi S. Gunawi, University of Chicago We study storage performance in over 450,000 disks and 4,000 SSDs over 87 days for an overall total of 857 million (disk) and 7 million (SSD) drive hours. We find that storage performance instability is not uncommon:
0.2% of the time, a disk is more than 2x slower than its peer drives in the same RAID group (and 0.6% for SSD). As a consequence, disk and SSD-based RAIDs experience at least one slow drive (i.e., storage tail) 1.5% and 2.2% of the time. To understand the root causes, we correlate slowdowns with other metrics (workload I/O rate and size, drive event, age, and model). Overall, we find that the primary cause of slowdowns are the internal characteristics and idiosyncrasies of modern disk and SSD drives. We observe that storage tails can adversely impact RAID performance, motivating the design of tail-tolerant RAID. To the best of our knowledge, this work is the most extensive documentation of storage performance instability in the field.
|
Ramin Elahi, Adjunct Faculty, UC Santa Cruz Silicon Valley
2:00 pm–2:45 pm In relation to "Cloud computing," it is bringing the computing and services to the edge of the network. Fog provides data, compute, storage, and application services to end users. The distinguishing Fog characteristics are its proximity to end users, its dense geographical distribution, and its support for mobility. Services are hosted at the network edge or even end devices such as set-top-boxes or access points. Thus, it can alleviate issues the IoT (Internet of Things) is expected to produce such as reducing service latency, and improving QoS, resulting in superior user experience. Fog Computing supports emerging Internet of Everything (IoE) applications that demand real-time/predictable latency (industrial automation, transportation, networks of sensors and actuators). Thanks to its wide geographical distribution the Fog paradigm is well positioned for real time big data and real time analytics. Fog supports densely distributed data collection points, hence adding a fourth axis to the often mentioned Big Data dimensions (volume, variety, and velocity)
In relation to "Cloud computing," it is bringing the computing and services to the edge of the network. Fog provides data, compute, storage, and application services to end users. The distinguishing Fog characteristics are its proximity to end users, its dense geographical distribution, and its support for mobility. Services are hosted at the network edge or even end devices such as set-top-boxes or access points. Thus, it can alleviate issues the IoT (Internet of Things) is expected to produce such as reducing service latency, and improving QoS, resulting in superior user experience. Fog Computing supports emerging Internet of Everything (IoE) applications that demand real-time/predictable latency (industrial automation, transportation, networks of sensors and actuators). Thanks to its wide geographical distribution the Fog paradigm is well positioned for real time big data and real time analytics. Fog supports densely distributed data collection points, hence adding a fourth axis to the often mentioned Big Data dimensions (volume, variety, and velocity)
Ramin Elahi, MSEE, is an Adjunct Professor and Advisory Board Member at UC Santa Cruz Silicon Valley. He has taught Data Center Storage, Unix Networking, and System Administration at the University of California, Santa Cruz and University of California, Berkeley Extensions since 1996. He is also a Senior Education Consultant at EMC Corp. He has also served as a Training Solutions Architect at NetApp, where he managed the engineering on-boarding and training curricula development. Prior to NetApp, he was Training Site Manager at Hitachi Data Systems Academy in charge of development and delivery of enterprise storage arrays certification programs. He also was the global network storage curricula manager at Hewlett-Packard. His areas of expertise are data center storage design and architecture, Data ONTAP, cloud storage, and virtualizations. He also held variety of positions at Cisco, Novell and SCO as a consultant and escalation engineer. He implemented the first university-level Data Storage and Virtualization curriculum in Northern California back in 2007.
Thomas Rivera, Senior Technical Associate, HDS
2:45 pm–3:30 pm After reviewing the diverging data protection legislation in the EU member states, the European Commission (EC) decided that this situation would impede the free flow of data within the EU zone. The EC response was to undertake an effort to "harmonize" the data protection regulations, and it started the process by proposing a new data protection framework. This proposal includes some significant changes like defining a data breach to include data destruction, adding the right to be forgotten, adopting the U.S. practice of breach notifications, and many other new elements. Another major change is a shift from a directive to a rule, which means the protections are the same for all 27 countries and includes significant financial penalties for infractions. This tutorial explores the new EU data protection legislation and highlights the elements that could have significant impacts on data handling practices.
After reviewing the diverging data protection legislation in the EU member states, the European Commission (EC) decided that this situation would impede the free flow of data within the EU zone. The EC response was to undertake an effort to "harmonize" the data protection regulations, and it started the process by proposing a new data protection framework. This proposal includes some significant changes like defining a data breach to include data destruction, adding the right to be forgotten, adopting the U.S. practice of breach notifications, and many other new elements. Another major change is a shift from a directive to a rule, which means the protections are the same for all 27 countries and includes significant financial penalties for infractions. This tutorial explores the new EU data protection legislation and highlights the elements that could have significant impacts on data handling practices.
Learning Objectives:
- Highlight the major changes to the previous data protection directive
- Review the differences between "Directives" versus "Regulations," as it pertains to the EU legislation
- Learn the nature of the Reforms as well as the specific proposed changes—in both the directives and the regulations
Thomas Rivera has over 30 years of experience in the storage industry, specializing in file services and data protection technology, and is a senior technical associate with Hitachi Data Systems. Thomas is also an active member of the Storage Networking Industry Association (SNIA) as an elected member of the SNIA Board of Directors, and is co-chair of the Data Protection and Capacity Optimization (DPCO) Committee, as well as a member of the Security Technical Working Group, and the Analytics and Big Data Committee.
|
3:30 pm–4:00 pm |
Wednesday |
Break with Refreshments
|
4:00 pm–5:30 pm |
Wednesday |
Session Chairs: Haryadi Gunawi, University of Chicago; Daniel Peek, Facebook
|
Liang Ming, Research Engineer, Development and Research, Distributed Storage Field, Huawei
4:00pm–5:30pm At first, we will introduce the current status and pain point of Huawei distributed storage technology. And then, the next generation of key-value converged storage solution will be presented. Following, we will discuss the conception of key-value storage and show what we have done to promote the key-value standard.
Next, we will show how do we build our block service, file service and object service based on the same kay-value pool. Converged Storage Technology. At last, the future of storage technology for VM and container will be discussed. The audience of topic should be the engineers of storage technology. And we want to discuss about converged storage technology with storage peers.
At first, we will introduce the current status and pain point of Huawei distributed storage technology. And then, the next generation of key-value converged storage solution will be presented. Following, we will discuss the conception of key-value storage and show what we have done to promote the key-value standard.
Next, we will show how do we build our block service, file service and object service based on the same kay-value pool. Converged Storage Technology. At last, the future of storage technology for VM and container will be discussed. The audience of topic should be the engineers of storage technology. And we want to discuss about converged storage technology with storage peers.
Learning Objectives:
- Convergence, Consolidation, and Virtualization of Infrastructure, Storage Devices, and Servers
- Deployment: Tutorial address use-cases typical deployment or operational considerations focused on use
|
6:00 pm–8:00 pm |
Wednesday |
Grand Ballroom Foyer/TusCA Courtyard
Sponsored by NetApp Check out the cool new ideas and the latest preliminary research on display at the Poster Session and Reception. Take part in discussions with your colleagues over complimentary food and drinks.
View the list of accepted posters.
|