Towards a {Fault-Resilient} Cloud Management Stack

Xiaoen Ju; Livio Soares; Kang G. Shin; Kyung Dong Ryu; Jeremia Bär; Zsolt István; Bryan Stephenson

Workshop Program

All sessions will be held in the Crystal Room unless otherwise noted.

The full papers published by USENIX for the workshop are available as a download or individually below to workshop registrants immediately and to everyone beginning June 25, 2013. Everyone can view the abstracts immediately. Copyright to the individual works is retained by the author[s].

Download Paper Archives

Attendee Files

HotCloud '13 Papers ZIP

Tuesday, June 25, 2013

8:15 a.m.–9:00 a.m.	Tuesday
Continental Breakfast Market Street Foyer
9:00 a.m.–10:00 a.m.	Tuesday
Keynote Address Dystopia as a Service Adrian Cockcroft, Netflix We have spent years striving to build perfect apps running on perfect kernels on perfect CPUs connected by perfect networks, but this utopia hasn't really arrived. Instead we live in a dystopian world of buggy apps changing several times a day running on JVMs running on an old version of Linux running on Xen running on something I can't see, that only exists for a few hours, connected by a network of unknown topology and operated by many layers of automation. I will discuss the new challenges and demands of living in this dystopian world of cloud-based services. I will also give an overview of the Netflix open source cloud platform (see netflix.github.com) that we use to create our own island of utopian agility and availability regardless of what is going on underneath. We have spent years striving to build perfect apps running on perfect kernels on perfect CPUs connected by perfect networks, but this utopia hasn't really arrived. Instead we live in a dystopian world of buggy apps changing several times a day running on JVMs running on an old version of Linux running on Xen running on something I can't see, that only exists for a few hours, connected by a network of unknown topology and operated by many layers of automation. I will discuss the new challenges and demands of living in this dystopian world of cloud-based services. I will also give an overview of the Netflix open source cloud platform (see netflix.github.com) that we use to create our own island of utopian agility and availability regardless of what is going on underneath. Adrian is the Director of Architecture for the Cloud Systems team at Netflix, and is leading the Netflix Open Source Software program and the Cloud Prize. Before joining Netflix in 2007, Adrian was a founding member of eBay Research Labs. He spent 16 years at Sun Microsystems, including a stint as Distinguished Engineer and chief architect for Sun's High Performance Technical Computing group. Adrian authored two editions of Sun Performance and Tuning: Java and the Internet; and co-authored two Sun Blueprint books, Resource Management and Capacity Planning for Internet Services. Available Media Read more about Dystopia as a Service
10:00 a.m.–10:30 a.m.	Tuesday
Break with Refreshments Market Street Foyer
10:30 a.m.–12:30 p.m.	Tuesday
Virtualization Inception: Towards a Nested Cloud Architecture Changbin Liu and Yun Mao, AT&T Labs-Research Despite the increasing popularity of Infrastructure-as-a-service (IaaS) clouds, providers have been very slow in adopting a large number of innovative technologies, such as live VM migration, dynamic resource management, and VM replication. In this paper, we argue that the reasons are not only technical but also fundamental, due to lack of transparency and conflict of interest between providers and customers. We present our vision inception, a nested IaaS cloud architecture to overcome this impasse. Inception clouds are built entirely on top of the resources acquired from today’s clouds, and provide nested VMs to end users. We discuss the benefits, use cases, and challenges of inception clouds, and present our network design and prototype implementation. Available Media HVX: Virtualizing the Cloud Alex Fishman, Mike Rapoport, Evgeny Budilovsky, and Izik Eidus, Ravello Systems Nowadays there is significant diversity in Infrastructure as a Service (IaaS) clouds. The differences span from virtualization technology and hypervisors, through storage and network configuration, to the cloud management APIs. These differences make migration of a VM (or a set of VMs) from a private cloud into a public cloud, or between different public clouds, complicated or even impractical for many use-cases. HVX is a virtualization platform that enables complete abstraction of underlying cloud infrastructure from the application virtual machines. HVX allows deployment of existing VMs into the cloud without any modifications, mobility between the clouds and easy duplication of the entire deployment. HVX can be deployed on almost any existing IaaS cloud. Each instance of the HVX deployment packs in a nested hypervisor, virtual hardware, network and storage configuration. Combined with image store and management APIs, the HVX can be used for the creation of a virtual cloud that utilizes existing cloud provider infrastructure as the hardware rather than using physical servers, switches and storage. Available Media Thinner Clouds with Preallocation Ittay Eyal, Technion; Flavio Junqueira, Microsoft Research; Idit Keidar, Technion Different companies sharing the same cloud infrastructure often prefer to run their virtual machines (VMs) in isolation, i.e., one VM per physical machine (PM) core, due to security and efficiency concerns. To accommodate load spikes, e.g., those caused by flash-crowds, each service is allocated more machines than necessary for its instantaneous load. However, flash-crowds of different hosted services are not correlated, so at any given time, only a subset of the machines are used. We present here the concept of preallocation—having a single physical machine ready to quickly run one of a few possible VMs, without ever running more than one at a given time. The preallocated VMs are initialized and then paused by the hypervisor. We suggest a greedy preallocation strategy, and evaluate it by simulation, using workloads based on previous analyses of flash-crowds. We observe a reduction of 35-50% in number of PMs used compared with classical dynamic allocation. This means that a datacenter can provide per-service isolation with 35%-50% fewer PMs. Available Media FluidCloud: An Open Framework for Relocation of Cloud Services Andy Edmonds, Zurcher Hochschule für Angewandte Wissenschaften; Thijs Metsch, Intel Ireland Limited; Dana Petcu, Institute e-Austria Timisoara; Erik Elmroth, Umeå University; Jamie Marshall, Prologue; Plamen Ganchosov, CloudSigma Cloud computing delivers new levels of being connected, instead of the once disconnected PC-type systems. The proposal in this paper extends that level of connectedness in the cloud such that cloud service instances, hosted by providers, can relocate between clouds. This is key in order to provide economical and regulatory benefits but more importantly liberation and positive market disruption. While service providers want to lock in their customer’s services, FluidCloud wants the liberation of those and thereby allow the service owner to freely choose the best matching provider at any time. In the cloud world of competing cloud standards and software solutions, each only partially complete, the central research question which this paper intends to answer: How to intrinsically enable and fully automate relocation of service instances between clouds? Available Media
12:30 p.m.–1:45 p.m.	Tuesday
FCW Luncheon Imperial Ballroom
1:45 p.m.–3:45 p.m.	Tuesday
Networking and Bandwidth NicPic: Scalable and Accurate End-Host Rate Limiting Sivasankar Radhakrishnan, University of California, San Diego; Vimalkumar Jeyakumar, Stanford University; Abdul Kabbani, Google Inc.; George Porter, University of California, San Diego; Amin Vahdat, Google Inc. and University of California, San Diego The degree of multiplexing in datacenters necessitates careful resource scheduling, and network bandwidth is no exception. Unfortunately, today we are left with little control to accurately schedule network traffic with low overhead on end-hosts. This paper presents NicPic, a system which enables accurate network traffic scheduling in a scalable fashion. The key insight in NicPic is to decouple the responsibility of state-management and packet scheduling between the CPU and the NIC, respectively. The CPU is only involved in classifying packets, enqueueing them in per-class queues maintained in host memory, and specifying rate limits for each traffic class. The NIC handles packet scheduling and transmission on a real-time basis. In this paper, we present the design of NicPic which offers a scalable solution for transmit scheduling in future high speed NICs. Available Media Virtualizing Trafﬁc Shapers for Practical Resource Allocation Gautam Kumar, Microsoft Research and University of California, Berkeley; Srikanth Kandula, Peter Bodik, and Ishai Menache, Microsoft Research Many network resource allocation scenarios require traffic shapers, such as weighted fair queues or priority queues, to achieve their goals; however, current switches offer very few of such shapers. We show how to virtualize the traffic shapers in switches and network interface cards. Doing so implies that one can mimic the traffic shaping behavior of many more shapers (at least one order of magnitude more) by just using the small number of shapers that are available on commodity switches and NICs. From a prototype on Arista 7048 and simulations that replay traces from production datacenters, we show early results that indicate feasibility and improvement over simpler alternatives. We also present theory-based intuition as to why such virtualization of shapers is feasible. Available Media CloudMirror: Application-Aware Bandwidth Reservations in the Cloud Jeongkeun Lee, HP Labs; Myungjin Lee, University of Edinburgh; Lucian Popa, Yoshio Turner, Sujata Banerjee, and Puneet Sharma, HP Labs; Bryan Stephenson, HP Enterprise Services Cloud computing providers today do not offer guarantees for the network bandwidth available in the cloud, preventing tenants from running their applications predictably. To provide guarantees, several recent research proposals offer tenants a virtual cluster abstraction, emulating physical networks. Whereas offering dedicated virtual network abstractions is a significant step in the right direction, in this paper we argue that the abstractions exposed to tenants should aim to model tenant application structures rather than aiming to mimic physical network topologies. The fundamental problem in providing users with dedicated network abstractions is that the communication patterns of real applications do not typically resemble the rigid physical network topologies. Thus, the virtual network abstractions often poorly represent the actual communication patterns, resulting in overprovisioned/wasted network resources and underutilized computational resources. We propose a new abstraction for specifying bandwidth guarantees, which is easy to use because it closely follows application models; our abstraction specifies guarantees as a graph between application components. We then propose an algorithm to efficiently deploy this abstraction on physical clusters. Through simulations, we show that our approach is significantly more efficient than prior work for offering bandwidth guarantees. Available Media AIN: A Blueprint for an All-IP Data Center Network Vasileios Pappas, Hani Jamjoom, and Dan Williams, IBM T. J. Watson Research Center With both Ethernet and IP powering Data Center Networks (DCNs), one should wonder if their coexistence is necessary or an unquestioned legacy. Especially in cloud DCNs, the role of layer-2 is diminishing rapidly as the vast majority of applications only require layer-3 connectivity. At the same time, cloud workloads are demanding that DCN architectures better support network scalability, multitenancy, address virtualization, and endhost mobility. This paper argues that today’s DCN architectures have a conjoined layer-2 and layer-3 design that is not only unnecessary, but is counter productive. We present AIN, a blueprint for scalable all-IP DCN. AIN implements virtual routers inside hypervisors, eliminating the need for virtual switching. It leverages the proven scalability of routing protocols and avoids unnecessary packet encapsulation, while supporting both multitenancy and end-host mobility. Finally, AIN is compatible with existing applications and is fully realizable with current protocols and hardware. Available Media
3:45 p.m.–4:15 p.m.	Tuesday
Break with Refreshments Market Street Foyer
4:15 p.m.–5:45 p.m.	Tuesday
I/O A Hidden Cost of Virtualization When Scaling Multicore Applications Xiaoning Ding, New Jersey Institute of Technology; Phillip B. Gibbons and Michael A. Kozuch, Intel Labs Pittsburgh As the number of cores in a multicore node increases in accordance with Moore’s law, the question arises as to whether there are any “hidden” costs of a cloud’s virtualized environment when scaling applications to take advantage of larger core counts. This paper identifies one such cost, resulting in up to a 583% slowdown as the multicore application is scaled. Surprisingly, these slowdowns arise even when the application’s VM has dedicated use of the underlying physical hardware and does not use emulated resources. Our preliminary findings indicate that the source of the slowdowns is the intervention from the VMM during synchronization-induced idling in the application, guest OS, or supporting libraries. We survey several possible mitigations, and report preliminary findings on the use of “idleness consolidation” and “IPI-free wakeup” as a partial mitigation. Available Media Priority IO Scheduling in the Cloud Filip Blagojević, Cyril Guyot, Qingbo Wang, Timothy Tsai, Robert Mateescu, and Zvonimir Bandić, HGST Research Current state of the art runtime systems, built for managing cloud environments, almost always assume resource sharing among multiple users and applications. In large part, these runtime systems rely on functionalities of the node-local operating systems to divide the local resources among the applications that share a node. While OSes usually achieve good resource sharing by creating distinct application-level domains across CPUs and DRAM, managing the IO bandwidth is a complex task due to lack of communication between the host and IO device. In our work we focus on controlling the hard disk drive (HDD) IO bandwidth available to user-level applications in a cloud environment. We introduce priority-based (PBS) IO scheduling, where the ordering of IO commands is decided cooperatively by the host and IO device. We implemented our scheduling policies in the Linux storage stack and Hadoop Distributed File System. Initial results show that in a cloud environment, the real-time commands managed by PBS outperform the real-time IO scheduling of the Linux kernel by up to a factor of ~5 for the worst case latency, and by more than 2x for average latency. Available Media vPipe: One Pipe to Connect Them All! Sahan Gamage, Ramana Kompella, and Dongyan Xu, Purdue University Many enterprises use the cloud to host applications such as web services, big data analytics and storage. One common characteristic among these applications is that, they involve significant I/O activities, moving data from a source to a sink, often without even any intermediate processing. However, cloud environments tend to be virtualized in nature with tenants obtaining virtual machines (VMs) that often share CPU. Virtualization introduces a significant overhead for I/O activity as data needs to be moved across several protection boundaries. CPU sharing introduces further delays into the overall I/O processing data flow. In this paper, we propose a simple abstraction called vPipe to mitigate these problems. vPipe introduces a simple “pipe” that can connect data sources and sinks, which can be either files or TCP sockets, at the virtual machine monitor (VMM) layer. Shortcutting the I/O at the VMM layer achieves significant CPU savings and avoids scheduling latencies that degrade I/O throughput. Our evaluation of vPipe prototype on Xen shows that vPipe can improve file transfer throughput significantly while reducing overall CPU utilization. Available Media

Wednesday, June 26, 2013

8:15 a.m.–9:00 a.m.	Wednesday
Continental Breakfast Market Street Foyer
9:00 a.m.–10:00 a.m.	Wednesday
Keynote Address How I Learned to Stop Worrying and Trust the Cloud Mike Dahlin, The University of Texas at Austin Companies and individuals are understandably worried about entrusting their precious body of data and processing to the cloud. After all, cloud services appear to be complex black-boxes operated in unknown ways by a third party. And, when a major outage hits the front page of the papers, one cannot help but worry. In this talk, I argue that we can make cloud services highly trustworthy both by extending the limits of fault tolerance techniques and by designing systems to support end-to-end correctness checks. Better still, cloud providers are in a unique position to protect their customers from a range of threats. We, as researchers, should further explore these opportunities that will arise as we stop worrying and trust the cloud. Companies and individuals are understandably worried about entrusting their precious body of data and processing to the cloud. After all, cloud services appear to be complex black-boxes operated in unknown ways by a third party. And, when a major outage hits the front page of the papers, one cannot help but worry. In this talk, I argue that we can make cloud services highly trustworthy both by extending the limits of fault tolerance techniques and by designing systems to support end-to-end correctness checks. Better still, cloud providers are in a unique position to protect their customers from a range of threats. We, as researchers, should further explore these opportunities that will arise as we stop worrying and trust the cloud. Available Media Read more about How I Learned to Stop Worrying and Trust the Cloud
10:00 a.m.–10:30 a.m.	Wednesday
Break with Refreshments Market Street Foyer
10:30 a.m.–12:30 p.m.	Wednesday
Security, Mobile, and Big Data Jobber: Automating Inter-Tenant Trust in the Cloud Andy Sayler, Eric Keller, and Dirk Grunwald, University of Colorado, Boulder Today, a growing number of users are opting to move their systems and services from self-hosted data centers to cloud-hosted IaaS offerings. These users wish to both benefit from the efficiencies that shared multitenant hosting can offer while still retaining or improving the kinds of security and control afforded by self-hosted solutions. In this paper, we present Jobber: a highly autonomous multi-tenant network security framework designed to handle both the dynamic nature of cloud datacenters and the desire for optimized inter-tenant communication. Our Jobber prototype leverages principals from Software Defined Networking and Introduction Based Routing to build an inter-tenant network policy solution capable of automatically allowing optimized communication between trusted tenants while also blocking or rerouting traffic from untrusted tenants. Jobber is capable of automatically responding to the frequent changes in virtualized data center topologies and, unlike traditional security solutions, requires minimal manual configuration, cutting down on configuration errors. Available Media Toward Secure and Convenient Browsing Data Management in the Cloud Chuan Yue, University of Colorado, Colorado Springs Cloud and Web-centric computing is a significant trend in computing. However, the design and development of modern Web browsers failed to catch up this significant trend to address many challenging Browsing Data Insecurity and Inconvenience (referred to as BDII) problems that bother millions of Web users. In this position paper, we present our preliminary investigation on the BDII problems of the five most popular Web browsers and highlight the necessity and importance of addressing those problems. We also propose to explore a novel Cloud computing Age Browser (referred to as CAB) architecture that leverages the reliability and accessibility advantages of cloud storage services to fundamentally address the BDII problems. Available Media Clone2Clone (C2C): Peer-to-Peer Networking of Smartphones on the Cloud Sokol Kosta, Vasile Claudiu Perta, and Julinda Stefa, Sapienza-Università di Roma; Pan Hui, The Hong Kong University of Science and Technology and Deutsche Telekom Labs; Alessandro Mei, Sapienza-Università di Roma In this work we introduce Clone2Clone (C2C), a distributed peer-to-peer platform for cloud clones of smartphones. C2C shows dramatic performance improvement that is made possible by offloading communication between smartphones on the cloud. Along the way toward C2C, we study the performance of device-clones hosted in various virtualization environments in both private (local servers) and public (Amazon EC2) clouds. We build the first Amazon Customized Image (AMI) for Android-OS—a key tool to get reliable performance measures of mobile cloud systems—and show how it boosts up performance of Android images on the Amazon cloud service. We then design, build, and implement C2C. Upon it we build CloneDoc, a secure real-time collaboration system for smartphone users. We measure the performance of CloneDoc by means of experiments on a testbed of 16 Android smartphones and clones hosted on both private and public cloud services. We show that C2C makes it possible to implement distributed execution of advanced peer-to-peer services in a network of mobile smartphones reducing 3 times the cellular data traffic and saving 99%, 80%, and 30% of the battery for respectively security checks, user status update and document editing. Available Media Transparent and Flexible Network Management for Big Data Processing in the Cloud Anupam Das, University of Illinois at Urbana-Champaign; Cristian Lumezanu, Yueping Zhang, Vishal Singh, and Guofei Jiang, NEC Labs; Curtis Yu, University of California, Riverside We introduce FlowComb, a network management framework that helps Big Data processing applications, such as Hadoop, achieve high utilization and low data processing times. FlowComb predicts application network transfers, sometimes before they start, by using software agents installed on application servers and while remaining completely transparent to the application. A centralized decision engine collects data movement information from agents and schedules upcoming flows on paths such that the network does not become congested. Results on our lab testbed show that FlowComb is able to reduce the time to sort 10GB of randomly generated data by 35% while changing paths for only 6% of the transfers. Available Media
12:30 p.m.–2:00 p.m.	Wednesday
FCW Luncheon Market Street Foyer
2:00 p.m.–3:30 p.m.	Wednesday
The Need for Speed Achieving 10Gbps Line-rate Key-value Stores with FPGAs Michaela Blott, Kimon Karras, Ling Liu, and Kees Vissers, Xilinx Inc.; Jeremia Bär and Zsolt István, ETH Zürich Distributed in-memory key-value stores such as memcached have become a critical middleware application within current web infrastructure. However, typical x86-based systems yield limited performance scalability and high power consumption as their architecture with its optimization for single thread performance is not wellmatched towards the memory-intensive and parallel nature of this application. In this paper we present the design of a novel memcached architecture implemented on Field Programmable Gate Arrays (FPGAs) which is the first in literature to achieve 10Gbps line rate processing for all packet sizes. By transformation of the functionality into a dataflow architecture, the implementation can not only provide significant speed-up but also operate at a lower power consumption than any x86. More specifically, with our prototype we have measured an increase of up to a factor of 36x in requests per second per Watt that can be serviced in comparison to the best published numbers for regular servers with optimized software. Additionally, we show that through the tight integration of network interface, memory and compute, round trip latency can be reduced down to below 4.5 microseconds. Available Media Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph Rui Wang, Christopher Conrad, and Sam Shah, LinkedIn Social networks often require the ability to perform low latency graph computations in the user request path. For example, at LinkedIn, we show the graph distance and common connections when we show a profile in any context on the site. To do this, we have developed a distributed and partitioned graph system that scales to hundreds of millions of members and their connections, handling hundreds of thousands of queries per second. To accomplish this scaling, real time distributed graph traversal is converted into set intersections that are accomplished in a scatter/gather manner. A network performance bottleneck forms on the gather node as it must merge partial results from many machines. In this paper, we present a modified greedy set cover algorithm that is used to locate the minimal set of machines that can serve the partial results. Our results indicate that we are able to save 25% in the 99th percentile latency of these graph distance calculations for LinkedIn’s social graph workloads. Available Media Dissecting Open Source Cloud Evolution: An OpenStack Case Study Salman A. Baset, Chunqiang Tang, Byung Chul Tak, and Long Wang, IBM T.J. Watson Research Center Open source cloud platforms are playing an increasingly significant role in cloud computing. These systems have been undergoing rapid development cycles. As an example, OpenStack has grown approximately 10 times in code size since its inception two and a half years ago. Confronting such fast-pace changes, cloud providers are challenged to understand OpenStack’s up-to-date behaviors and adapt and optimize their provisioned services and configurations to the platform changes quickly. In this work, we use a black-box technique for conducting a deep analysis of four versions of OpenStack. This is the first study in the literature that tracks the evolution of a popular open source cloud platform. Our analysis results reveal important trends of SQL queries in OpenStack, help identify precise points for targeted error injection, and point out potential ways to improve performance (e.g. by changing authentication to PKI). The OpenStack case study in this work effectively demonstrates that our automated black-box methodology aids quick understanding of platform evolution and is critical for effective and rapid consumption of an open-source cloud platform. Available Media
3:30 p.m.–4:00 p.m.	Wednesday
Break with Refreshments Market Street Foyer
4:00 p.m.–5:30 p.m.	Wednesday
Reliability The Case for Limping-Hardware Tolerant Clouds Thanh Do, University of Wisconsin–Madison; Haryadi S. Gunawi, University of Chicago With the advent of cloud computing, thousands of machines are connected and managed collectively. This era is confronted with a new challenge: performance variability, primarily caused by large-scale management issues such as hardware failures, software bugs, and configuration mistakes. In this paper, we highlight one overlooked cause: limping hardware—hardware whose performance degrades significantly compared to its specification. We present numerous cases of limping disks, network and processors seen in production, along with the negative impacts of such failures on existing large-scale distributed systems. From these findings, we advocate the concept of limping-hardware tolerant clouds. Available Media Cloud Computing for the Power Grid: From Service Composition to Assured Clouds György Dán, KTH Royal Institute of Technology; Rakesh B. Bobba, George Gross, and Roy H. Campbell, University of Illinois Urbana-Champaign The electric power industry is one of the few industries where cloud computing has not yet found much adoption, even though electric power utilities rely heavily on communications and computation to plan, operate and analyze power systems. In this paper we explore the reasons for this phenomenon. We identify a variety of power system applications that could benefit from cloud computing. We then discuss the security requirements of these applications, and explore the design space for providing the security properties through application layer composition and via assured cloud computing. We argue that a combination of these two approaches will be needed to meet diverse application requirements at a cost that can justify the use of cloud computing. Available Media Towards a Fault-Resilient Cloud Management Stack Xiaoen Ju, University of Michigan; Livio Soares, IBM T.J. Watson Research Center; Kang G. Shin, University of Michigan; Kyung Dong Ryu, IBM T.J. Watson Research Center Cloud management stacks have become a new important layer in cloud computing infrastructure, simplifying the configuration and management of cloud computing environments. As the resource manager and controller of an entire cloud, a cloud management stack has significant impact on the fault-resilience of a cloud platform. However, our preliminary study on the fault-resilience of OpenStack—an open source state-of-the-art cloud management stack—shows that such an emerging software stack needs to be better designed and tested in order to serve as a building block for fault-resilient cloud environments. We discuss the issues identified by our fault-injection tool and make suggestions on how to strengthen cloud management stacks. Available Media

Workshop Program

Tuesday, June 25, 2013

Continental Breakfast

Break with Refreshments

FCW Luncheon

Break with Refreshments

Wednesday, June 26, 2013

Continental Breakfast

Break with Refreshments

FCW Luncheon

Break with Refreshments