{RaMP}: A Lightweight {RDMA} Abstraction for Loosely Coupled Applications

Babar Naveed Memon; Meghna Pancholi; Xiayue Charles Lin; Siyuan Hu; Mosharaf Chowdhury; Arshia Mufti; Yuan He; Arthur Scott Wesley; Christina Delimitrou; Tim Brecht; Ion Stoica; Kenneth Salem; Bernard Wong; Benjamin Cassell; Naved Ansari; Kyle Hogan; Charles Munson; Larry Rudolph; Gene Cooperman; Peter Desnoyers; Orran Krieger

All sessions will be held in the Essex Ballroom Center unless otherwise noted.

Papers are available for download below to registered attendees now and to everyone beginning July 9, 2018. Paper abstracts are available to everyone now. Copyright to the individual works is retained by the author[s].

Downloads for Registered Attendees
(Sign in to your USENIX account to download these files.)

Attendee Files

HotCloud '18 Attendee List (PDF)

HotCloud '18 Paper Archive (ZIP)

HotCloud '18 Paper Archive (ZIP) (Updated 7/9/18)

Monday, July 9, 2018

7:30 am–8:00 am

Continental Breakfast

8:00 am–9:50 am

Streaming and Graphing

Session Chair: Indranil Gupta, University of Illinois at Urbana–Champaign

Reinventing Video Streaming for Distributed Vision Analytics

Chrisma Pakha, University of Chicago; Aakanksha Chowdhery, Google; Junchen Jiang, University of Chicago/Microsoft

Available Media

Driven by the ubiquity of camera-equipped devices and the prohibitive cost of modern vision techniques, we see a growing need for a custom video streaming protocol that streams videos from cameras to cloud servers to perform neural-network-based video analytics. In the past decade, numerous efforts have optimized video streaming protocols to provide better quality-of-experience to users. In this paper, we call upon this community to similarly develop custom streaming protocols for better analytics quality (accuracy) of vision analytics (deep neural networks). We highlight new opportunities to substantially improve the tradeoffs between bandwidth usage and inference accuracy. The key insight is that existing streaming protocols are essentially client (camera)-driven; in contrast, by letting the analytics server decide what/when to stream from the camera, the new protocols can directly optimize the inference accuracy while minimizing bandwidth usage. Preliminary evaluation shows that a simple protocol can reduce bandwidth consumption by 4-23x compared to traditional streaming protocols and other distributed video analytics pipelines while maintaining at least 95% inference accuracy.

Rethinking Adaptability in Wide-Area Stream Processing Systems

Albert Jonathan, Abhishek Chandra, and Jon Weissman, University of Minnesota Twin Cities

Available Media

Adaptability is an important property of stream processing systems since the systems need to maintain low latency and high throughput execution of long-running queries. In a wide-area environment, dynamics are common not only because of the workload variability but also the nature of wide-area network (WAN) bandwidth that frequently changes. In this work, we study the adaptability property of stream processing systems designed for a wide-area environment. Specifically, we (1) discuss the challenges of reconfiguring query executions in a wide-area environment, (2) propose ideas how to adapt existing reconfiguration techniques used in centralized Cloud to a wide-area environment, and (3) discuss the trade-offs between them. A key finding is that the best adaptation technique to use depends on the network conditions, types of query, and optimization metrics.

RecService: Distributed Real-Time Graph Processing at Twitter

Ajeet Grewal, Jerry Jiang, Gary Lam, Tristan Jung, Lohith Vuddemarri, Quannan Li, and Aaditya Landge, Twitter; Jimmy Lin, University of Waterloo

Available Media

We present RecService, a distributed real-time graph processing engine that drives billions of recommendations on Twitter. Real-time recommendations are framed in terms of a user's social context and real-time events incident on that social context, generated from ad hoc point queries and long-lived standing queries. Results form the basis of downstream processes that power a variety of recommendation products. A noteworthy aspect of the system's design is a partitioning scheme whereby manipulations of graph adjacency lists are local to a cluster node. This eliminates cross-node network traffic in query execution, enabling horizontal scalability and avoiding "hot spots" caused by vertices with large degrees.

Towards Fast and Scalable Graph Pattern Mining

Anand Padmanabha Iyer, UC Berkeley; Zaoxing Liu and Xin Jin, JHU; Shivaram Venkataraman, Microsoft Research / University of Wisconsin; Vladimir Braverman, JHU; Ion Stoica, UC Berkeley

Available Media

While there has been a tremendous interest in processing graph-structured data, existing distributed graph processing systems take several minutes or even hours to mine simple patterns on graphs. In this paper, we try to answer the question of whether it is possible to build a graph pattern mining engine that is both fast and scalable. Leveraging the observation that in several pattern mining tasks, providing an approximate answer is good enough, we propose the use of approximation for graph pattern mining. However, we find that existing approximation techniques do not work for this purpose. Based on this, we present a new approach for approximate graph pattern mining that leverages recent advancements in graph approximation theory. Our preliminary evaluations show encouraging results: compared to state-of-the-art, finding 3-motifs in Twitter graph is 165× faster while incurring only 5% error. We conclude by discussing several systems challenges to make our proposal practical.

Monarch: Gaining Command on Geo-Distributed Graph Analytics

Anand Padmanabha Iyer, UC Berkeley; Aurojit Panda, NYU; Mosharaf Chowdhury, University of Michigan; Aditya Akella, University of Wisconsin; Scott Shenker and Ion Stoica, UC Berkeley

Available Media

A number of existing and emerging application scenarios generate graph-structured data in a geo-distributed fashion. Although there is a lot of interest in distributed graph processing systems, none of them support graphs that are geo-distributed. Geo-distributed analytics, on the other hand, has not focused on iterative workloads such as distributed graph processing.

In this paper, we look at the problem of efficient geo-distributed graph analytics. We find that optimizing the iterative processing style of graph-parallel systems is the key to achieving this goal rather than extending existing geo-distributed techniques to graph processing. Based on this, we discuss our proposal on building Monarch, the first system to our knowledge that focuses on geo-distributed graph processing. Our preliminary evaluation of Monarch shows encouraging results.

9:50 am–10:10 am

Break with Refreshments

10:10 am–12:00 pm

In between the Clouds and Networking

Session Chair: Jonathan Mace, Max Planck Institute for Software Systems

Steel: Simplified Development and Deployment of Edge-Cloud Applications

Shadi A. Noghabi, University of Illinois at Urbana-Champaign; John Kolb, University of California, Berkeley; Peter Bodik and Eduardo Cuervo, Microsoft Research

Available Media

The rapid growth in both the number and variety of cloud services has led to the emergence of complex applications using multiple cloud services (typically > 5). Although building cloud-based applications is relatively simple, extending them to an edge-cloud environment is complicated, error-prone, and time-consuming. Effectively using the edge requires dynamic adaptation and movement of services between edges and the cloud, e.g., in presence of failures or load spikes. However, current platforms do not support this. We propose Steel, a high-level abstraction that simplifies development, enables transparent adaptation and migration, and automates deployment and monitoring across the entire edge-cloud environment. Additionally, Steel enables modular and pluggable optimizations, such as placement, load balancing, and dynamic communication, with minimal effort. Based on our evaluation, we reduce the initial development effort (1.7x – 3.5x reduction in lines of config), support dynamic moves with minimal changes (∼2 lines of config per move, reducing 95% of the overhead), and support easy development of optimizations (e.g., a placement optimizer requires ∼500 lines of code).

To Relay or Not to Relay for Inter-Cloud Transfers?

Fan Lai, Mosharaf Chowdhury, and Harsha Madhyastha, University of Michigan

Available Media

Efficient big data analytics over the wide-area network (WAN) is becoming increasingly more popular. Current geo-distributed analytics (GDA) systems employ WAN-aware optimizations to tackle WAN heterogeneities. Although extensive measurements on public clouds suggest the potential for improving inter-datacenter data transfers via detours, we show that such optimizations are unlikely to work in practice. This is because the widely accepted mantra used in a large body of literature – WAN bandwidth has high variability – can be misleading. Instead, our measurements across 40 datacenters belonging to Amazon EC2, Microsoft Azure, and Google Cloud Platform show that the available WAN bandwidth is often spatially homogeneous and temporally stable between two virtual machines (VMs) in different datacenters, even though it can be heterogeneous at the TCP flow level. Moreover, there is little scope for either bandwidth or latency optimization in a cost-effective manner via relaying. We believe that these findings will motivate the community to rethink the design rationales of GDA systems and geo-distributed services.

NodeFinder: Scalable Search over Highly Dynamic Geo-distributed State

Azzam Alsudais, University of Colorado Boulder; Zhe Huang, Bharath Balasubramanian, and Shankaranarayanan Puzhavakath Narayanan, AT&T Labs Research; Eric Keller, University of Colorado Boulder; Kaustubh Joshi, AT&T Labs Research

Available Media

Finding nodes with certain criteria is a critical need for many cloud services. For example, a cloud monitoring service needs to query thousands of hosts in a data-center to check for resource usage while a cloud homing service needs to find edge data centers across the world that satisfy certain complex constraints. This is a challenging problem, especially when confronted with highly dynamic state, scale on the order of hundreds or even thousands, geo-distribution and complex query constraints that traverse decentralized data sources. In this paper, we address this problem through the design of NodeFinder that is based on a novel pull-based approach in which we maintain decentralized (peer-to-peer) groups of nodes structured according to the node attribute values (i.e., their state). This allows queries to be sent to only a few representatives of the groups that have the potential of satisfying the constraints, and then the representatives gossip with their peers and return the latest set of nodes. This guarantees freshness of results, and ensures directed and thereby scalable querying. We show NodeFinder's use in production use-cases such as host monitoring in our OpenStack clouds and NFV homing on edge clouds. Our preliminary experiments on Amazon EC2 illustrate NodeFinder's scalability and efficiency as compared to today's approaches.

CacheCloud: Towards Speed-of-light Datacenter Communication

Shelby Thomas, Geoffrey M. Voelker, and George Porter, University of California, San Diego

Available Media

The network is continuing to advance unabated, with 100-Gb/s Ethernet already a commercial reality, and now 400-Gb/s in the standardization process. Within a single rack, inter-server latency will soon be in the range of 250ns, trending ever closer towards the fundamental propagation delay of light in a fiber. In this paper we argue that in this environment, a major performance bottleneck is DRAM latency, which has stagnated at 100ns per access. Consequently, data should be kept entirely in the CPU cache which has an order of magnitude lower latency and RAM should be considered a slower backing store. We describe the implications of designing a "speed of light'' datacenter network stack that can keep up with ever increasing link speeds with the goal of keeping latency as close to the speed of light propagation time as possible.

Packet-Level Analytics in Software without Compromises

Oliver Michel, University of Colorado Boulder; John Sonchack, University of Pennsylvania; Eric Keller, University of Colorado Boulder; Jonathan M. Smith, University of Pennsylvania

Available Media

Traditionally, network monitoring and analytics systems rely on aggregation (e.g., flow records) or sampling to cope with the high data rates large-scale networks operate on. This has the downside that, in doing so, we lose data granularity and accuracy, and in general limit the possible network analytics we can perform. Recent proposals leveraging software-defined networking or programmable hardware provide more fine-grained, per-packet monitoring but still are based on the fundamental principle of data reduction before being processed. In this paper, we provide a first step towards a cloud-scale, packet-level monitoring and analytics system based on stream processing entirely in software. Software provides virtually unlimited programmability and makes modern (e.g., machine-learning) network analytics applications possible. We identify unique features of network analytics applications which enable the specialization of stream processing systems. As a result, an evaluation with our preliminary implementation shows that we can scale up to several million packets per second per core and together with load balancing and further optimizations, the vision of cloud-scale per-packet network analytics is possible.

12:00 pm–1:00 pm

Luncheon for Workshop Attendees

Essex Ballroom South

1:00 pm–2:50 pm

Big Picture and File Systems

Session Chair: Shivram Venkatraman, University of Wisconsin

Go Serverless: Securing Cloud via Serverless Design Patterns

Sanghyun Hong, University of Maryland, College Park; Abhinav Srivastava and William Shambrook, Frame.io; Tudor Dumitraș, University of Maryland, College Park

Available Media

Due to the shared responsibility model of clouds, tenants have to manage the security of their workloads and data. Developing security solutions using VMs or containers creates further problems as these resources also need to be secured. In this paper, we advocate for taking a serverless approach by proposing six serverless design patterns to build security services in the cloud. For each design pattern, we describe the key advantages and present applications and services utilizing the pattern. Using the proposed patterns as building blocks, we introduce a threat-intelligence platform that collects logs from various sources, alerts malicious activities, and take actions against such behaviors. We also discuss the limitations of serverless design and how future implementations can overcome those limitations.

Making Cloud Easy: Design Considerations and First Components of a Distributed Operating System for Cloud

Wolfgang John, Joacim Halen, Xuejun Cai, Chunyan Fu, Torgny Holmberg, Vladimir Katardjiev, Tomas Mecklin, and Mina Sedaghat, Pontus Sköldström, Daniel Turull, Vinay Yadhav, and James Kempf, Ericsson Research

Available Media

Cloud offerings have over the years been transformed from bare-metal to virtual machines, to containers, and most recently to serverless functions. Each of these execution context abstractions has been accompanied by a new layer of centralized management, but these extra management layers have led to a confusing and fragile system, that wastes developer time on managing execution contexts with little contribution to the application. We argue that it is time to revisit ideas of Single System Image (SSI) concepts to simplify the management of cloud execution contexts. An SSI abstraction for Cloud provides easy and convenient developer access to resources without recourse to programming multiple levels of execution contexts. We propose a novel set of design principles inspired by earlier distributed operating system and SSI research. We also present a first corresponding service, realizing fully decentralized resource management, folding multiple layers of centralized management stacks into a single layer spanning across datacenter resources. As a result, developers never see individual execution environments, but deal with processes and IPC familiar from local development machines.

Seer: Leveraging Big Data to Navigate the Increasing Complexity of Cloud Debugging

Yu Gan, Meghna Pancholi, Dailun Cheng, Siyuan Hu, Yuan He, and Christina Delimitrou, Cornell University

Available Media

Performance unpredictability in cloud services leads to poor user experience, degraded availability, and has revenue ramifications. Detecting performance degradation a posteriori helps the system take corrective action, but does not avoid the QoS violations. Detecting QoS violations after the fact is even more detrimental when a service consists of hundreds of thousands of loosely-coupled microservices, since performance hiccups can quickly propagate across the dependency graph of microservices. In this work we focus on anticipating QoS violations in cloud settings to mitigate performance unpredictability to begin with.

We propose Seer, a cloud runtime that leverages the massive amount of tracing data cloud systems collect over time and a set of practical learning techniques to signal upcoming QoS violations, as well as identify the microservice(s) causing them. Once an imminent QoS violation is detected Seer uses machine-level hardware events to determine the cause of the QoS violation, and adjusts the resource allocations to prevent it. In local clusters with 10 40-core servers and 200-instance clusters on GCE running diverse cloud microservices, we show that Seer correctly anticipates QoS violations 91% of the time, and attributes the violation to the correct microservice in 89% of cases. Finally, Seer detects QoS violations early enough for a corrective action to almost always be applied successfully.

A Case for Packing and Indexing in Cloud File Systems

Saurabh Kadekodi, Carnegie Mellon University; Bin Fan and Adit Madan, Alluxio, Inc.; Garth A. Gibson, Carnegie Mellon University, Vector Institute; Gregory R. Ganger, Carnegie Mellon University

Available Media

Small (kilobyte-sized) objects are the bane of highly scalable cloud object stores. Larger (at least megabyte-sized) objects not only improve performance, but also result in orders of magnitude lower cost, due to the current operation-based pricing model of commodity cloud object stores. For example, in Amazon S3's current pricing scheme, uploading 1GiB data by issuing 4KiB PUT requests (at 0.0005 cents each) is approximately $57\times$ more expensive than storing that same 1GiB for a month. To address this problem, we propose client-side packing of small immutable files into gigabyte-sized \textit{blobs} with embedded indices to identify each file's location. Experiments with a packing implementation in Alluxio (an open-source distributed file system) illustrate the potential benefits, such as simultaneously increasing file creation throughput by up to 60$\times$ and decreasing cost to $1/25000$ of the original.

Cloud-Native File Systems

Remzi Arpaci-Dusseau and Andrea Arpaci-Dusseau, University of Wisconsin-Madison; Venkat Venkataramani, Rockset, Inc.

Available Media

We present the case for cloud-native system design, focused on the creation of CNFS, a local file system built specifically for the cloud era. We first present numerous storage and CPU design principles that any cloud-native storage system should consider; we demonstrate the utility of these principles through the design of CNFS. CNFS is a hierarchical, copy-on-write file system that migrates data and metadata across cloud storage volumes to meet user objectives, and harnesses remote CPU workers to perform critical background work such as migration and compression.

2:50 pm–3:10 pm

Break with Refreshments

3:10 pm–4:20 pm

Security

Session Chair: Christina Delimitrou, Cornell University

A Secure Cloud with Minimal Provider Trust

Amin Mosayyebzadeh and Gerardo Ravago, Boston University; Apoorve Mohan, Northeastern University; Ali Raza and Sahil Tikale, Boston University; Nabil Schear, MIT Lincoln Laboratory; Trammell Hudson, Two Sigma; Jason Hennessey, Boston University and NetApp; Naved Ansari, Boston University; Kyle Hogan, MIT; Charles Munson, MIT Lincoln Laboratory; Larry Rudolph, Two Sigma; Gene Cooperman and Peter Desnoyers, Northeastern University; Orran Krieger, Boston University

Available Media

Bolted is a new architecture for a bare metal cloud with the goal of providing security-sensitive customers of a cloud the same level of security and control that they can obtain in their own private data centers. It allows tenants to elastically allocate secure resources within a cloud while being protected from other previous, current, and future tenants of the cloud. The provisioning of a new server to a tenant isolates a bare metal server, only allowing it to communicate with other tenant's servers once its critical firmware and software have been attested to the tenant. Tenants, rather than the provider, control the tradeoffs between security, price, and performance. A prototype demonstrates scalable end-to-end security with small overhead compared to a less secure alternative.

Enforcing Context-Aware BYOD Policies with In-Network Security

Adam Morrison, Rice University; Lei Xue, The Hong Kong Polytechnic University; Ang Chen, Rice University; Xiapu Luo, The Hong Kong Polytechnic University

Available Media

Bring Your Own Device, or BYOD, has become the new norm for many enterprise networks; but it also raises security concerns. We present our vision of programmable in-network security, and sketch an initial system design, Poise. Poise has a high-level policy language that can express a wide range of existing and new security policies. These policies can then be compiled to device con- figurations to collect device/apps information, as well as switch programs in P4 that enforce security inside the network. Our initial results seem promising — Poise runs with reasonable overhead, and it successfully detects policy violations for seven useful BYOD policies.

A Side-channel Attack on HotSpot Heap Management

Xiaofeng Wu, Kun Suo, Yong Zhao, and Jia Rao, The University of Texas at Arlington

Available Media

CPU time-multiplexing is a common practice in multi-tenant systems to improve system utilization. However, the sharing of CPU and a single system clock makes it difficult for programs to accurately measure the length of an operation. Since a program is not always running in a time-sharing system but the system clock always advances, time perceived by one program could be dilated as it may include the run time of another program. Applications employing time-based resource management face a potential security threat of time manipulation.

HotSpot, a widely used Java virtual machine (JVM), relies on timing garbage collections to infer an appropriate heap size. In this paper, we present a new active side-channel attack that exploits time dilation to break the heap sizing algorithm in parallel scavenge, the default garbage collector in JDK 8. We demonstrate that a deliberate attack targeting a specific type of GC is able to crash a Java program with out-of-memory errors, cause excessive garbage collection, and leads to significant memory waste due to a bloated heap.

4:20 pm–4:30 pm

Short Break

4:30 pm–6:00 pm

Rage against the Machine

Session Chair: Remzi Arpaci-Dusseau, University of Wisconsin-Madison

RECap: Run-Escape Capsule for On-demand Managed Service Delivery in the Cloud

Shripad Nadgowda, Sahil Suneja, and Canturk Isci, IBM Research

Available Media

Application runtimes are undergoing a fundamental transformation in the cloud, from general-purpose op- erating systems (OSes) in virtual machines (VMs) to lightweight, minimal OSes in microcontainers. On one hand, such transformation is helping reduce application footprint in the cloud to increase agility, density and to minimize attack surface. On the other hand it makes it challenging to implement system and application man- agement tasks. Inspired from the on-demand Function as a Service (FaaS) model in serverless computing, in RECap we are designing a cloud-native solution to deliver core systems and application management tasks through specially-managed Capsule containers. Capsule containers are dynamically attached to the running containers for the duration of their implemented function and are safely removed from application context afterwards. More generally, RECap framework allows us to design disaggregated on-demand managed service delivery for containers in the cloud. In this paper, we describe the motivation and the emerging opportunity for RECap in the cloud. We discuss its core design principles, performance, security and manageability trade-offs. We present current design of RECap for the Kubernetes platform.

Say Goodbye to Virtualization for a Safer Cloud

Dan Williams, Ricardo Koller, and Brandon Lum, IBM T.J. Watson Research Center

Available Media

When it comes to isolation on the cloud, conventional wisdom holds that virtual machines (VMs) provide greater isolation than containers because of their low-level interface to the host. A lower-level interface reduces the amount of code and complexity needed in the kernel that must be relied upon for isolation. However, it is incorrectly assumed that virtualization mechanisms are required to achieve a low-level interface suitable for isolation. In this paper, we argue that the interface to the host can be lowered for any application by moving kernel components to userspace. We show that using a userspace network stack results in a 33% reduction in kernel code usage, which is 20% better than when resorting to virtualization mechanisms and using a VM.

From JVM to FPGA: Bridging Abstraction Hierarchy via Optimized Deep Pipelining

Jason Cong, Peng Wei, and Cody Hao Yu, University of California, Los Angeles

Available Media

FPGAs (field-programmable gate arrays) can be flexibly reconfigured to accelerate many computation kernels with orders-of-magnitude performance/watt improvement, making FPGA-based heterogeneous systems a promising approach to driving continuous performance and energy improvement in today's datacenters. However, the significant gains on computation kernels are often considerably offset by the extra data transfer overhead, resulting in considerably reduced system-wide speedup, or even slowdown. In this paper we propose a fully pipelined data transfer stack that achieves efficient JVM-FPGA communication through extensive pipelining. Also, we introduce a programming framework that automatically generates most of the pipeline code, freeing users from the bothersome details of FPGA management. Furthermore, we address the issue of multi-stage pipeline throughput optimization by formulating it into an integer linear programming problem and applying its solution for generating the optimal pipeline implementation. Experiments show that the proposed pipeline stack achieves 4.9x speedup for various computation kernels.