David Cohen, Intel, and Phillip Reisner, LINBIT
This talk will discuss the Storage Architecture employed by Intel's Data Management Platform (DMP). The DMP is a rack-centric, cluster design that employs an Ethernet-based fabric as its cluster interconnect. The default is a 3-stage Clos topology. The cluster's storage provides no redundancy and instead puts the burden on stateful micro-services to deal with their own redundancy requirements.
We will provide an overview of the DMP. Next, we'll drill into the details of the Storage subsystem, which is composed of Intel's RSD Pod Manager along with LINBIT's LINBIT storage orchestrator. In this section of the talk, we will include a performance characterization of the two volume types using FIO.
A DMP cluster is managed by Kubernetes with network and storage resources managed by Container Network and Storage Interface (CNI/CSI) providers. While DMP volumes provide no redundancy they are persistent and have a zone label attached to them. This use of the Kubernetes zone label concept is a key aspect of the DMP storage implementation as it ensures stateful micro-services being hosted on the platform are distributed across the cluster's fault domains. The stateful micro-service is then responsible for providing sufficient data redundancy to satisfy its availability and durability requirements.
(i) NVMe-over-Fabric (NVMe-oF) based Remote Logical Volumes Optimized for large Sequential I/O The DMP disaggregates physical storage devices from compute servers to allow storage capacity to scale independent of compute. The disaggregated storage devices are then pooled by an open-source, cluster-wide, volume manager called LINSTOR. LINBIT's framework is integrated with the cluster's k8s-based Orchestration/Scheduler function via LINBIT's Container Storage Interface (CSI) implementation. Logical volumes are provisioned from this pool and made available via NVMe-over-Fabric (NVMe-oF) to k8s-managed Pods running on the compute servers. These logical volumes are optimized for large sequential I/Os and are used to replace HDDs.
(ii) Local Logical Volumes Optimized for Optane DC Persistent Memory (DCPM) Compute servers in DMP are outfitted with Optane DCPM. These persistent DIMMs are also pooled by the LINBIT and made available with Kubernetes as logical volumes. In the case of Optane DCPM, LINBIT uses LVM to carve/provision logic volumes out of an NVDIMM Namespace.
After we review the Storage subsystem we will provide overviews of two workloads that are priorities for initial DMP deployments. The first of these is a Spark-based AI/Analytics Pipeline that uses Minio's s3-compatible object store as a replacement for HDFS. The second of these workloads is a MySQL/MariaDB transactional database on shared storage. To the best of our knowledge, this is the first open source transactional database that supports shared storage.
Finally, we'll conclude with an update on the status of the DMP effort, review preliminary performance results, and provide a few parting thoughts on the next steps for the DMP.
David Cohen, Intel
Dave Cohen is the Storage Solutions CTO in Intel's Nonvolatile Memory and Storage (NVMS) Group and the Chief Architect of Intel's Data Management Platform. This is a rack-centric, physical cluster that builds on the Intel Rack Scale Design (RSD). Dave has been at Intel for five years working on a variety of network and storage related solutions. His career spans over 30 years working on large scale, distributed systems for Fortune 500 companies across several industry segments.
Phillip Reisner, LINBIT
Philipp Reisner is the founder, CEO of LINBIT and author of DRBD. LINBIT grew out of a local Linux service provider into an international support provider for open source HA cluster software (focusing particularly on Pacemaker & DRBD). DRBD was started in 2000 and became part of the Linux kernel in 2.6.33. Philipp has guided LINBIT's upstream contributions to a number of open source infrastructure and clustering projects, including Kubernetes and OpenStack Cinder.
author = {David Cohen and Phillip Reisner},
title = {The Storage Architecture of Intel{\textquoteright}s Data Management Platform ({{DMP}})},
year = {2019},
address = {Boston, MA},
publisher = {USENIX Association},
month = feb
}