Vault '19 Program

All sessions will be held in Commonwealth Ballroom unless otherwise noted.

There are separate registration fees for the Tutorial and Technical Sessions listed below. For more information, please view the Vault '19 Registration Information page.

Downloads for Registered Attendees
(Sign in to your USENIX account to download these files.)

Attendee Files 
Vault '19 Attendee List (PDF)

Monday, February 25

8:00 am–9:00 am

Continental Breakfast

Grand Ballroom Foyer

9:00 am–10:30 am

Morning Tutorial

Introduction to Storage for Containers

Monday, 9:00 am12:30 pm

Vasily Tarasov, Dimitris Skourtis, Ted Anderson, and Ali Anwar, IBM Research

Containers and related technologies allow to manage computational resources at fine granularity, increase the pace of software development, testing, and deployment, and at the same time improve the efficiency of infrastructure utilization. Recognizing these benefits, many enterprises are upgrading their technology by incorporating containers in their infrastructure and workflows.

As containerization technologies enter enterprise market they meet new functional demands. Providing and managing persistent, highly-available, yet nimble storage is a particularly important requirement. A number of new and existing companies and open-source projects are aggressively entering this arena. We expect that in the coming years the demand for professionals who are fluent in storage for containers will rise dramatically.

In our tutorial we plan to cover all major topics of storage for containers. We will first describe the structure of Docker's layered images, its local CoW-based storage, and Docker registry. We will then present the concept of persistent volumes and dynamic provisioning in Kubernetes. As part of the tutorial, we will use the insights and examples that we accumulated while working on adapting IBM's Spectrum Scale for containerization environments.

Vasily Tarasov, IBM Research

Vasily Tarasov is a Research Staff Member at IBM. His current research projects include storage for containers and high-performance file systems as a service. Vasily worked extensively on storage, file systems, data deduplication, performance and workload analysis. Vasily is an author of numerous academic papers, regularly serves on PC in major scientific conferences, and gave many presentations to large audiences.

Dimitris Skourtis, IBM Research

Dimitris Skourtis is a Research Staff Member at IBM. His current work is around cloud orchestrators and persistent storage for containers. Prior to IBM, he worked on resource management at VMware, where he prototyped and shipped SIOCv2, a policy-driven storage scheduling solution, as part of vSphere 6.5. Dimitris received his Ph.D. focusing on flash storage and predictable performance at UC Santa Cruz.

Ted Anderson, IBM Research

Ted Anderson is a Senior Software Engineer with IBM Research. Ted has extensive experience with several distributed file systems, most recently Spectrum Scale/GPFS. His recent work utilizes concurrency, caching, and delegation that guarantee correctness using distributed coherency protocols to maximize the performance of parallel applications.

Ali Anwar, IBM Research

Ali Anwar is a research staff member at IBM Research. He received his Ph.D. in Computer Science from Virginia Tech. In his earlier years he worked as a tools developer (GNU GDB) at Mentor Graphics. Ali's research interests are in distributed computing systems, cloud storage management, file and storage systems, and intersection of systems and machine learning.

10:30 am–11:00 am

Break with Refreshments

Grand Ballroom Foyer

11:00 am–12:30 pm

Morning Tutorial (continued)

Introduction to Storage for Containers

Monday, 9:00 am12:30 pm

Vasily Tarasov, Dimitris Skourtis, Ted Anderson, and Ali Anwar, IBM Research

Containers and related technologies allow to manage computational resources at fine granularity, increase the pace of software development, testing, and deployment, and at the same time improve the efficiency of infrastructure utilization. Recognizing these benefits, many enterprises are upgrading their technology by incorporating containers in their infrastructure and workflows.

As containerization technologies enter enterprise market they meet new functional demands. Providing and managing persistent, highly-available, yet nimble storage is a particularly important requirement. A number of new and existing companies and open-source projects are aggressively entering this arena. We expect that in the coming years the demand for professionals who are fluent in storage for containers will rise dramatically.

In our tutorial we plan to cover all major topics of storage for containers. We will first describe the structure of Docker's layered images, its local CoW-based storage, and Docker registry. We will then present the concept of persistent volumes and dynamic provisioning in Kubernetes. As part of the tutorial, we will use the insights and examples that we accumulated while working on adapting IBM's Spectrum Scale for containerization environments.

Vasily Tarasov, IBM Research

Vasily Tarasov is a Research Staff Member at IBM. His current research projects include storage for containers and high-performance file systems as a service. Vasily worked extensively on storage, file systems, data deduplication, performance and workload analysis. Vasily is an author of numerous academic papers, regularly serves on PC in major scientific conferences, and gave many presentations to large audiences.

Dimitris Skourtis, IBM Research

Dimitris Skourtis is a Research Staff Member at IBM. His current work is around cloud orchestrators and persistent storage for containers. Prior to IBM, he worked on resource management at VMware, where he prototyped and shipped SIOCv2, a policy-driven storage scheduling solution, as part of vSphere 6.5. Dimitris received his Ph.D. focusing on flash storage and predictable performance at UC Santa Cruz.

Ted Anderson, IBM Research

Ted Anderson is a Senior Software Engineer with IBM Research. Ted has extensive experience with several distributed file systems, most recently Spectrum Scale/GPFS. His recent work utilizes concurrency, caching, and delegation that guarantee correctness using distributed coherency protocols to maximize the performance of parallel applications.

Ali Anwar, IBM Research

Ali Anwar is a research staff member at IBM Research. He received his Ph.D. in Computer Science from Virginia Tech. In his earlier years he worked as a tools developer (GNU GDB) at Mentor Graphics. Ali's research interests are in distributed computing systems, cloud storage management, file and storage systems, and intersection of systems and machine learning.

12:30 pm–1:30 pm

Tutorial Luncheon

Back Bay Ballroom AB

1:30 pm–3:00 pm

Afternoon Tutorial 1

Managed File Services in the Cloud: What to Use, Where, and Why?

Monday, 1:30 pm3:00 pm

Geert Jansen and Jacob Strauss, Amazon Web Services

This tutorial is targeted towards administrators and developers that would like to understand the latest developments in cloud-based managed file services. We'll start off with an overview of available offerings, we'll cover intended use cases, the pros and cons of each of the offerings, and we'll make a comparison with self-managed offerings. We'll also talk about how what tools are available to move on-premises file storage solutions to the cloud.

Geert Jansen, Amazon Web Services

Geert Jansen is a Senior Product Manager at Amazon Web Services where he works on Amazon EFS. He was Product Owner for Red Hat CloudForms and also worked at Ravello Systems and Royal Dutch Shell. He received an M.Sc. in Applied Physics from the Eindhoven University of Technology.

Jacob Strauss, Amazon Web Services

Jacob Strauss is a Principal Engineer at Amazon Web Services, currently working on the Amazon Elastic File System. He has been at AWS since 2013 building distributed systems under the guise of storage services, and growing Amazon's Boston-area engineering teams. He received PhD and undergraduate degrees in computer science from MIT.

3:00 pm–3:30 pm

Break with Refreshments

Grand Ballroom Foyer

3:30 pm–5:00 pm

Afternoon Tutorial 2

Performance Analysis in Linux Storage Stack with BPF

Monday, 3:30 pm5:00 pm

Taeung Song, KossLab, and Daniel T. Lee, The University of Soongsil

How can we deeply analyze and trace performance issues in the Linux Storage Stack?

Many monitoring and benchmark tools help us to find bottlenecks and problems through system profiling. However, it is pretty tricky to dig deeper into the root cause on code/function level because of complex execution flow (e.g. multiple contexts or async flow). In this tutorial, we introduce in-kernel BPF technology and practice analyzing performance issues in the Linux Storage Stack using several tracing tools (BPF, uftrace, ctracer, and perf) with attendees step-by-step. This session is targeted towards administrators, researchers, and developers.

BPF is a technology that allows safely injecting and executing custom code in the kernel at runtime, an unprecedented functionality. With BPF, by leveraging the injected custom code in a kernel, profiling and tracing is low overhead, and makes more richer introspection available.

Taeung Song, KossLab

Taeung is a Software Engineer in KOSSLAB (Korea Opensource Software Developers Lab) and Opensource Contributor in regard to Tracing & Profiling technology such as perf, uftrace, BPF, etc.

Daniel T. Lee, University Student

Daniel T. Lee is a Bachelor's degree student at the University of Soongsil and has a deep enthusiasm for Linux. He has been contributing to uftrace: Function (graph) tracer since 2018. He is passionate about tracing and profiling, and he really loves cloud engineering. He has a deep interest in this field, has written a book about API gateways entitled "Kong: Becoming a King of API Gateways," and has shared many technical presentations.

7:00 pm–10:00 pm

Birds-of-a-Feather Sessions (BoFs)

The evening Birds-of-a-Feather Sessions (BoFs) will be a forum for open discussion on topics of interest to the community. A few participants will present short introductions, overviews, or status reports from projects relevant to each of the topics, followed by informal discussion and participation. If you wish to present on a particular topic and have not already been contacted, please send us a short proposal at vault19chairs@usenix.org.

Tuesday, February 26

7:30 am–8:30 am

Continental Breakfast

Grand Ballroom Foyer

8:30 am–10:15 am

Failures & Faults & More

Making Ceph Fast in the Face of Failure

Tuesday, 8:30 am9:00 am

Neha Ojha, Red Hat

Ceph Luminous and Mimic improve the impact of recovery on client I/O. In this talk, we'll discuss the key features that affect this, and how Ceph users can take advantage of them.

Neha Ojha, Red Hat

Neha is a Senior Software Engineer at Red Hat. She is the project technical lead for the core team focusing on RADOS. Neha holds a Master's degree in Computer Science from the University of California, Santa Cruz.

Her most recent talks have been at Mountpoint, co-located with Open Source Summit NA, 2018 and Ceph Day Silicon Valley.

CrashMonkey: Finding File System Crash-Consistency Bugs with Bounded Black-Box Testing

Tuesday, 9:00 am9:25 am

Jayashree Mohan, The University of Texas at Austin

Available Media

We present a new approach to testing file-system crash consistency: bounded black-box crash testing (B3). B3 tests the file system in a black-box manner using workloads of file-system operations. Since the space of possible workloads is infinite, B3 bounds this space based on parameters such as the number of file-system operations or which operations to include, and exhaustively generates workloads within this bounded space. Each workload is tested on the target file system by simulating power-loss crashes while the workload is being executed, and checking if the file system recovers to a correct state after each crash. B3 builds upon insights derived from our study of crash-consistency bugs reported in Linux file systems in the last five years. We observed that most reported bugs can be reproduced using small workloads of three or fewer file-system operations on a newly-created file system, and that all reported bugs result from crashes after fsync() related system calls. We built the tool CrashMonkey to demonstrate the effectiveness of this approach. CrashMonkey revealed 10 new crash-consistency bugs in widely-used, mature Linux file systems, seven of which existed in the kernel since 2014. It also revealed a data loss bug in a verified file system, FSCQ. The new bugs result in severe consequences like broken rename atomicity and loss of persisted files.

Jayashree Mohan, The University of Texas at Austin

Jayashree Mohan is a third year PhD student at the University of Texas at Austin. She works primarily on file and storage systems with a focus on testing the reliability of file systems. Prior to starting her PhD, she received a B.Tech in CS at the National Institute of Technology, Karnataka, India, in 2016. Over the past two Summers, she has interned at MSR India and MSR Cambridge, where she's given a couple of presentations about her ongoing research. She also presented her work on CrashMonkey at OSDI '18.

Experiences with Fuse in the Real World

Tuesday, 9:25 am9:50 am

Manoj Pillai, Raghavendra Gowdappa, and Csaba Henk, Red Hat

Available Media

The Filesystem in Userspace (FUSE) module provides a simple way to create user-space file systems. The shortcomings of this approach to implementing file systems have been debated many times in the past, a few times even with data to back up the arguments. In this talk, we will revisit the topic in the context of a distributed Software-Defined Storage (SDS) solution, gluster. We will present our experiences based on users deploying it in production over the years, with FUSE access as the primary interface. In this context, we will discuss some of the problem areas like memory management, and demonstrate trade-offs in implementing important caches in the user-space versus relying on kernel caches.

As gluster expands to newer use-cases like persistent storage for container platforms, it needs to efficiently handle a wide variety of workloads and more frequently handle smaller, single-client volumes. In this context, we see the need to absorb more recent FUSE performance enhancements like write-back caching, and we will present our characterization of the performance benefits obtained from these enhancements.

Manoj Pillai, Red Hat

Manoj Pillai is part of the Performance and Scale Engineering Group at Red Hat. His focus is on storage performance, particularly around gluster, and he has presented on these topics at Open Source Summit, FOSDEM, Vault 2017, Red Hat Summit and Gluster Summit.

Raghavendra Gowdappa, Red Hat

Raghavendra Gowdappa is one of the maintainers of Glusterfs and is currently employed by Red Hat. He has worked on interfacing Glusterfs with FUSE, caching, network and file distribution aspects of Glusterfs. His earlier presentations were at FOSDEM, Vault 2017 and Gluster Summit.

Csaba Henk, Red Hat

Csaba Henk has worked on the fuse layer of Glusterfs from the early times on. He has been involved in augmentative and integration projects, like geo-replication and OpenStack Manila glusterfs drivers. These days he's back at core Glusterfs and works on caches and fuse.

SMB3 Linux/POSIX Protocol Extensions: Overview and Update on Current Implementations

Tuesday, 9:50 am10:15 am

Jeremy Allison, Samba Team, Google, and Steve French, Samba Team, Microsoft Azure Storage

Available Media

The SMB3 POSIX Extensions, a set of protocol extensions to allow for optimal Linux and Unix interoperability with Samba, NAS and Cloud file servers, have evolved over the past year, with test implementations in Samba and now merged into the Linux kernel. These extensions address various compatibility problems for Linux and Unix clients (such as case sensitivity, locking, delete semantics and mode bits among others). This presentation will review the state of the protocol extensions, what was learned in the implementations in Samba and also in the Linux kernel (including from running exhaustive Linux file system functional tests to try to better match local file system behavior over SMB3 mounts) and what it means for real applications.

With the deprecation of older less secure dialects like CIFS (which had standardized POSIX Extensions documented by SNIA), these SMB3 POSIX Extensions are urgently needed to be more broadly deployed to avoid functional or security problems and to optimally access Samba from Linux.

Steve French, Samba team, Microsoft Azure Storage

Steve French is a member of the Samba team, and is also the original author (and maintainer) of the CIFS (SMB3) VFS, the module used to access Samba, Windows, Mac, NAS and Azure cloud from Linux. He is a frequent speaker at Storage and Samba events.

10:15 am–10:45 am

Break with Refreshments

Grand Ballroom Foyer

10:45 am–12:00 pm

Build It Fast, Really Fast

From Open-Channel SSDs to Zoned Namespaces

Tuesday, 10:45 am11:10 am

Matias Bjørling, Western Digital

Available Media

Open-Channel Solid State Drive architectures are adopted rapidly by hyper-scales, all-flash array vendors, and large storage system vendors. The versatile storage interface admits solid state drive to expose essential knobs to control latency, I/O predictability, and I/O isolation. The rapid adoption has created a diverse set of different Open-Channel SSD drive specs that each solves the need of a single or few users. However, the specifications are yet to be standardized.

The Zoned Namespaces (ZNS) Technical Proposal in the NVMe workgroup is developing an industry standardization for these types of interfaces. Creating a foundation on which we can build a robust software eco-system on top and streamline implementation efforts.

This talk covers the motivation, characteristics of Zoned Namespaces, possible software improvements, and early results to show off the effectiveness of these types of drives.

Matias Bjørling, Western Digital

Matias Bjørling is Director of Solid State System Software at Western Digital. He is the author of the Open-Channel SSD 1.2 and 2.0 specifications and maintainer of the Open-Channel SSD subsystem in the Linux kernel. Before joining the industry, he obtained a Ph.D. in operating systems, and non-volatile storage by doing performance characterization of flash-based SSDs, worked on the Linux kernel blk-mq block layer and began the early work on the Open-Channel SSD interface.

New Techniques to Improve Small I/O Workloads in Distributed File Systems

Tuesday, 11:10 am11:35 am

Dan Lambright, Huawei

Available Media

Distributed file systems work well with high throughput applications that are parallelizable. Due to network overhead, they tend to perform less well with workloads that are meta-data or small-file intensive. This problem has been closely studied, resulting in many innovative ideas. For example, researchers have proposed storing inodes in column-store databases to speed up directory reads. Another idea is to have file systems publish “snapshots” visible to a subset of clients during metadata creation, which are later subscribed to by the rest of the system.

Are these techniques practical outside university labs? To answer this question, we introduce software that makes the original implementations much easier to use, by acting as a layer on top of Ceph object storage. The talk will walk through how to set up and run the configuration in realistic environments. The original research will be described in detail, explaining how improved performance comes with some loss of Posix generality, along with a small number of new operational steps outside of traditional file system workflows. The talk will show how this solution could be a good fit for analytics use cases where file system semantics are needed and there is flexibility at the application level.

Dan Lambright, Huawei

Dan has worked in open source storage at Red Hat and also at AWS. Today he is building distributed storage at Huawei. He has spoken at Vault, LinuxCon, OpenStack, LISA, and other venues. He also enjoys teaching at the University of Massachusetts Lowell.

Optimizing Storage Performance for 4–5 Million IOPs

Tuesday, 11:35 am12:00 pm

James Smart, Broadcom

Available Media

New workloads and Storage Class Memory (SCM) are demanding a new level of IOPs, bandwidth, and driver optimizations in Linux for storage networks. James Smart will discuss how the lpfc driver was recently reworked to achieve a new level of driver performance reaching 5+ Million IOPs. James will discuss hardware parallelization, per-core WQs, interrupt handling, and shared resource management that will benefit both SCSI and NVMe over Fabrics performance. James will show performance curves, discuss Linux OS issues encountered, and work yet to do in Linux to improve performance even more.

James Smart, Broadcom

James Smart is currently a Distinguished Engineer at Broadcom responsible for the architecture of Broadcom's Fibre Channel Linux stack. James has worked in storage software and firmware development for 32 years. James is a member of T11 and the NVM Express standards groups. James is also the author and maintainer of the Linux SCSI and NVME FC transports as well as the maintainer of Emulex Linux lpfc driver.

12:00 pm–1:30 pm

Conference Luncheon

Back Bay Ballroom D

1:30 pm–3:10 pm

Build It Big, Really Big & Smart

Design of a Composable Infrastructure Platform

Tuesday, 1:30 pm1:55 pm

Brian Pawlowski, Drivescale Inc.

Available Media

Composable Infrastructure in this talk is a method for the dynamic creation of secure application clusters from disaggregated compute, storage and networking. The problems facing such a solution are ones of availability, durability, scalability, performance and most importantly correctness.

The target applications are widely deployed data analytics and NoSQL database applications that can consist of 100's to 1,000's of compute nodes with 10,000's of disk for each application in a secure cluster instance.

The talk consists of five parts. We present a very brief description to the user view of creating virtual clusters on a composable infrastructure platform. We follow this with a short description of the problems and requirements for the platform. That motivates the bulk of the presentation describing the state machine design for a correct and durable orchestration platform that scales to 100,000's of managed elements. Select code and data structures are used to point out implementation details. The fourth part of the talk describes how standard Linux networking and storage subsystems are managed and used to create virtual clusters (including NVME over Fabric), and the open source components used by the platform to achieve scale, availability, and security. The final part of the talk details key failure scenarios and the recovery mechanisms that maintain correctness and availability.

Brian Pawlowski, Drivescale Inc.

Brian Pawlowski is currently CTO of Drivescale Inc. where he is involved in the design of software to support cluster computing and developing a platform for composable infrastructure.

As Vice President and Chief Architect at Pure Storage, he was focused on product and architecture development, with an eye toward simplifying the user experience. Earlier, Brian was a Senior Vice President and Chief Technology Officer at NetApp for more than six years, leading the design of high performance, highly reliable storage systems. Brian led the design for NetApp's first SAN product and holds several patents related to that work. He also worked on open protocols for storage since an earlier role at Sun Microsystems, was co-author of the NFS Version 3 specification and is currently co-chair of the NFS Version 4 working group at the Internet Engineering Task Force (IETF).

Brian has presented on several occasions at USENIX conferences, and previously at Vault 2015, and SANE 2004. He has presented technology and product architecture at many industry events.

The Storage Architecture of Intel's Data Management Platform (DMP)

Tuesday, 1:55 pm2:20 pm

David Cohen, Intel, and Phillip Reisner, LINBIT

Available Media

This talk will discuss the Storage Architecture employed by Intel's Data Management Platform (DMP). The DMP is a rack-centric, cluster design that employs an Ethernet-based fabric as its cluster interconnect. The default is a 3-stage Clos topology. The cluster's storage provides no redundancy and instead puts the burden on stateful micro-services to deal with their own redundancy requirements.

We will provide an overview of the DMP. Next, we'll drill into the details of the Storage subsystem, which is composed of Intel's RSD Pod Manager along with LINBIT's LINBIT storage orchestrator. In this section of the talk, we will include a performance characterization of the two volume types using FIO.

A DMP cluster is managed by Kubernetes with network and storage resources managed by Container Network and Storage Interface (CNI/CSI) providers. While DMP volumes provide no redundancy they are persistent and have a zone label attached to them. This use of the Kubernetes zone label concept is a key aspect of the DMP storage implementation as it ensures stateful micro-services being hosted on the platform are distributed across the cluster's fault domains. The stateful micro-service is then responsible for providing sufficient data redundancy to satisfy its availability and durability requirements.

(i) NVMe-over-Fabric (NVMe-oF) based Remote Logical Volumes Optimized for large Sequential I/O The DMP disaggregates physical storage devices from compute servers to allow storage capacity to scale independent of compute. The disaggregated storage devices are then pooled by an open-source, cluster-wide, volume manager called LINSTOR. LINBIT's framework is integrated with the cluster's k8s-based Orchestration/Scheduler function via LINBIT's Container Storage Interface (CSI) implementation. Logical volumes are provisioned from this pool and made available via NVMe-over-Fabric (NVMe-oF) to k8s-managed Pods running on the compute servers. These logical volumes are optimized for large sequential I/Os and are used to replace HDDs.

(ii) Local Logical Volumes Optimized for Optane DC Persistent Memory (DCPM) Compute servers in DMP are outfitted with Optane DCPM. These persistent DIMMs are also pooled by the LINBIT and made available with Kubernetes as logical volumes. In the case of Optane DCPM, LINBIT uses LVM to carve/provision logic volumes out of an NVDIMM Namespace.

After we review the Storage subsystem we will provide overviews of two workloads that are priorities for initial DMP deployments. The first of these is a Spark-based AI/Analytics Pipeline that uses Minio's s3-compatible object store as a replacement for HDFS. The second of these workloads is a MySQL/MariaDB transactional database on shared storage. To the best of our knowledge, this is the first open source transactional database that supports shared storage.

Finally, we'll conclude with an update on the status of the DMP effort, review preliminary performance results, and provide a few parting thoughts on the next steps for the DMP.

David Cohen, Intel

Dave Cohen is the Storage Solutions CTO in Intel's Nonvolatile Memory and Storage (NVMS) Group and the Chief Architect of Intel's Data Management Platform. This is a rack-centric, physical cluster that builds on the Intel Rack Scale Design (RSD). Dave has been at Intel for five years working on a variety of network and storage related solutions. His career spans over 30 years working on large scale, distributed systems for Fortune 500 companies across several industry segments.

Phillip Reisner, LINBIT

Philipp Reisner is the founder, CEO of LINBIT and author of DRBD. LINBIT grew out of a local Linux service provider into an international support provider for open source HA cluster software (focusing particularly on Pacemaker & DRBD). DRBD was started in 2000 and became part of the Linux kernel in 2.6.33. Philipp has guided LINBIT's upstream contributions to a number of open source infrastructure and clustering projects, including Kubernetes and OpenStack Cinder.

scoutfs: Large Scale POSIX Archiving

Tuesday, 2:20 pm2:45 pm

Zach Brown, Versity, Inc.

Available Media

scoutfs is an open source clustered POSIX file system built to support archiving of very large file sets. This talk will quickly summarize the challenges faced by sites that are managing large archives. We'll then explore the technical details of the persistent structures and network protocols that allow scoutfs to efficiently update and index file system metadata concurrently across a cluster. We'll see the interfaces that scoutfs provides on top of these mechanisms which allow management software to track the life cycle of billions of archived files.

Zach Brown, Versity, Inc.

Zach Brown has been working on the Linux kernel for a while now and has most recently focused on file systems, particularly Lustre, OCFS2, and btrfs. He's also helped organize previous Linux storage workshops and has given talks at Linux conferences including OLS, LCA, and LinuxTag.

Skyhook: Programmable Storage for Databases

Tuesday, 2:45 pm3:10 pm

Jeff LeFevre, University of California, Santa Cruz, and Noah Watkins, Red Hat

Available Media

Ceph is an open source distributed storage system that is object-based and massively scalable. Ceph provides developers with the capability to create data interfaces that can take advantage of local CPU and memory on the storage nodes (Ceph Object Storage Devices). These interfaces are powerful for application developers and can be created in C, C++, and Lua.

Skyhook is an open source storage and database project in the Center for Research in Open Source Software at UC Santa Cruz. Skyhook uses these capabilities in Ceph to create specialized read/write interfaces that leverage IO and CPU within the storage layer toward database processing and management. Specifically, we develop methods to apply predicates locally as well as additional metadata and indexing capabilities using Ceph's internal indexing mechanism built on top of RocksDB.

Skyhook's approach helps to enable scale-out of a single node database system by scaling out the storage layer. Our results show the performance benefits for some queries indeed scale well as the storage layer scales out.

Jeff LeFevre, University of California, Santa Cruz

Jeff LeFevre is an Assistant Adjunct Professor of Computer Science and Engineering at UC Santa Cruz where he does data management research and leads the Skyhook project within the Center for Research on Open Source Software (CROSS). He received his PhD from UC Santa Cruz with work on database physical design, and then joined HP Vertica R&D where he worked on connecting Vertica and Spark for 2 years before returning to academia to work on Skyhook. Previous speaking includes multiple conference presentations (SIGMOD) and invited talks on research papers.

Noah Watkins, Red Hat

Noah Watkins is a software engineer at Red Hat. He received his PhD from UC Santa Cruz in 2018 where he focused his research on the programmability of distributed storage systems.

3:10 pm–3:45 pm

Break with Refreshments

Grand Ballroom Foyer

3:45 pm–5:25 pm

Build It Safe, Build It Deep

Deep Dive into Ceph Block Storage

Tuesday, 3:45 pm4:10 pm

Mahati Chamarthy, Intel

Available Media

Ceph's object storage system allows users to mount Ceph as a thin-provisioned block device known as RADOS block Device (RBD). This talk aims to delve deep into the RBD, its design and features. In this session, we will discuss:

  • What entails creating an RBD image—rbd data and metadata
  • Prominent features like Striping, Snapshots, and Cloning
  • How RBD is configured in a virtualized setup using libvirt/qemu

Mahati Chamarthy, Intel

Mahati Chamarthy has been contributing to storage technologies for the past few years. She was a core developer for OpenStack Object Storage (Swift) and now an active contributor to Ceph. She works as a Cloud Software Engineer with Intel's Open Source Technology Center focusing on storage software.

Mindcastle.io: Secure Distributed Block Device for Edge and Cloud

Tuesday, 4:10 pm4:35 pm

Jacob Gorm Hansen, Vertigo.ai

Available Media

Camera-based smart IoT sensors are soon going to be everywhere. The recent success of Deep Neural Networks (DNNs) has opened the door to a new computer vision and AI applications. While initial deployments are using high-end server class hardware with expensive and power-hungry GPUs, optimizations and algorithmic improvements will soon make running the inference side of DNNs on low-cost Edge Computing devices commonplace. These devices will need software, and this software needs to be continually updated, both to keep track with the rapid development within machine learning/AI methods and datasets, and to keep their operating system and middleware installs tamper-proof and secure. To this end, we have been building Mindcastle, a serverless distributed block storage system with strong cryptographic integrity, built-in compression, and incremental atomic updates. Mindcastle is based on a highly performant and flash friendly LSM-like data structure, first developed at Bromium where it served as the storage foundation Bromium's Xen-derived uXen hypervisor, and has hosted millions of strongly isolated Micro-VMs across many security-sensitive installations worldwide.

Jacob Gorm Hansen, Vertigo.ai

Jacob Gorm Hansen is the founder of Vertigo.ai, an AI startup that focuses on AI for Edge computing. Jacob has a long track record of innovative computer systems development and research. After cutting his teeth as a senior programmer on the Hitman games francise, he returned to academia, where first major research contribution, the award-winning “VM live migration” technique for uninterrupted relocation of workloads in a data center became an enabling technology of cloud computing, which deployed by all major cloud providers today. His research into distributed storage systems was the enabler of VMware's VSAN business unit, and the IO efficiency and memory compression algorithmic breakthroughs he made at Bromium made strong virtualization-backed endpoint security more efficient than unsecured native execution. Jacob holds a BA in psychology, a BS, MSc and a Ph.D. in computer science, all from the University of Copenhagen, and has been a visiting scholar at the University of Washington. Jacob has published several papers on computer systems, and holds a number of US patents. He is a receipient of the the EuroSys Roger Needham Ph.D. award, and co-author of the USENIX Test-of-Time Awarded "Live Migration of Virtual Machines" USENIX NSDI paper, for which his Master's thesis served as inspiration. Jacob has given invited talks at Invited lectures at the universities of Aalborg, Aaarhus, Dresden, Erlangen, Potsdam, Washington, Trinity College Dublin, at numerous VMware engineerign events and off-sites, at Bell Labs Dublin, and at Microsoft Research labs in Seattle, Silicon Valley and Cambridge, UK.

IO and cgroups, the Current and Future Work

Tuesday, 4:35 pm5:00 pm

Josef Bacik, Facebook

Available Media

Resource isolation when it comes to IO has been incomplete for years, making it very hard to build a completely isolated solution for containers in Linux. Recently with the development of blk iolatency this has started to change, and hopefully marks the start of being able to build systems with complete resource isolation.

Josef Bacik, Facebook

Josef is a software engineer for Facebook and is one of the maintainers and core developers of Btrfs, NBD, and the blk iolatency controller. He has spoken at a variety of conferences, including the Linux Storage, File system, and Memory Management conference, Linux Plumbers conference, and a variety of Linuxcon's and other Linux Foundation end user conferences.

Self-Encrypting Drive (SED) Standardization Proposal for NVDIMM-N Devices

Tuesday, 5:00 pm5:25 pm

Frederick Knight and Sridhar Balasubramanian, NetApp

Available Media

A non-volatile DIMM (NVDIMM) is a Dual In-line Memory Module (DIMM) that maintains the contents of Synchronous Dynamic Random Access Memory (SDRAM) during power loss. An NVDIMM-N class of device can be integrated into a standard compute or storage platforms to provide non-volatility of the data in DIMM. NVDIMM relies on byte addressable energy backed function to preserve the data in case of power failure. A Byte Address Energy Backed Function is backed by a combination of SDRAM and non-volatile memory (e.g., NAND flash) on the NVDIMM-N. JESD245C Byte-Addressable Energy Backed Interface (BAEBI) defines the programming interface for NVDIMM-N class of devices.

An NVDIMM-N achieves non-volatility by:

  • performing a Catastrophic Save operation to copy SDRAM contents into NVM when host power is lost using an Energy Source managed by either the module or the host
  • performing a Restore operation to copy contents from the NVM to SDRAM when power is restored

An NVDIMM-N device may be a of self-encrypting device (SED) type that protects data at rest. This means the NVDIMM-N controller:

  • encrypts data during a Catastrophic Save operation
  • decrypts data during a Restore operation and the data is:
    • plaintext while sitting in SDRAM
    • ciphertext while sitting in NVM (e.g., flash memory)

Typically, an NVDIMM-N device may be used within the storage controller for performance acceleration against storage workloads or as a sundry storage to preserve debug information in case of power failure. When NVDIMM-N device is used as a caching layer, transient data is staged in NVDIMM-N device before the data is persisted/committed to the storage media. NVDIMM-N devices are also used as persistent storage media for staging memory dump files when critical failures occur at storage subsystem level before the system goes down.

The NVDIMM-N encryption standardization proposal involves cross-pollination between JEDEC (proposed BAEBI extensions to define security protocols in conjunction with encryption capability on the device) and TCG standards (proposed TCG Storage Interface Interactions Specifications content for handling self-encrypting NVDIMM-Ns plus adapting TCG Ruby SSC for NVDIMM-N devices) with industry sponsorship from HPE and NetApp.

The talk will begin with brief overview of NVDIMM-N device and associated storage-centric use cases followed by an overview of NVDIMM-N encryption scheme, and proposed self-encrypting device standardization approach for NVDIMM-N devices, which involves the following:

  1. Extensions to BAEBI specification to accommodate security protocol definitions in consequence with encryption capability in NVDIMM-N devices
  2. Extensions to TCG Storage Interface Specifications defining the Security Protocol Typed Block for handling interactions with NVDIMM-N devices
  3. Adapting TCG Ruby SSC standard for accommodating NVDIMM-N class devices

The talk will conclude by summarizing current state of the standardization proposal and approval process with JEDEC and TCG WG's.

Frederick Knight, NetApp

Frederick Knight is a Principal Standards Technologist at NetApp Inc. Fred has over 40 years of experience in the computer and storage industry. He currently represents NetApp in several National and International Storage Standards bodies and industry associations, including T10 (SCSI), T11 (Fibre Channel), T13 (ATA), IETF (iSCSI), SNIA, and JEDEC. He was the chair of the SNIA Hypervisor Storage Interfaces working group, the primary author of the SNIA HSI White Paper, the author of the new IETF iSCSI update RFC, and the editor for the T10 SES-3 standard. He is also the editor for the SCSI Architecture Model (SAM-6) and the Convenor for the ISO/IEC JTC-1/SC25/WG4 international committee (which oversees the international standardization of T10/T11/T13 documents). Fred has received several NetApp awards for excellence and innovation as well as the INCITS Technical Excellence Award for his contributions to both T10 and T11 and the INCITS Merit Award for his longstanding contributions to the international work of INCITS.

He is also the developer of the first native FCoE target device in the industry. At NetApp, he contributes to technology and product strategy and serves as a consulting engineer to product groups across the company. Prior to joining NetApp, Fred was a Consulting Engineer with Digital Equipment Corporation, Compaq, and HP where he worked on clustered operating system and I/O subsystem design.

Sridhar Balasubramanian, NetApp

Sridhar Balasubramanian is a Principal Security Architect within Product Security Group @ NetApp RTP. With over 25 years in the software industry, Sridhar is inventor/co-inventor for 16 US Patents and published 5 Conference papers till date. Sridhar's area of expertise includes Storage and Information Security, Security Assurance, Secure Software Development Lifecycle, Secure Protocols, and Storage Management. Sridhar holds Master's degrees in Physics and Electrical Engineering.

5:25 pm–7:00 pm

Dinner (on your own)

7:00 pm–10:00 pm

Birds-of-a-Feather Sessions (BoFs)

The evening Birds-of-a-Feather Sessions (BoFs) will be a forum for open discussion on topics of interest to the community. A few participants will present short introductions, overviews, or status reports from projects relevant to each of the topics, followed by informal discussion and participation. If you wish to present on a particular topic and have not already been contacted, please send us a short proposal at vault19chairs@usenix.org.