Plugging {Side-Channel} Leaks with Timing Information Flow Control

Bryan Ford; Minlan Yu; Abhishek Sharma; Ramesh Govindan; Chandra Krintz; Haitao Wu

Workshop Program

All sessions will be held in Constitution B unless otherwise noted.

June 12, 2012

9:00 a.m.–10:00 a.m.				Tuesday
Cloud Risks Session Chair: Dave Maltz, Microsoft The Seven Deadly Sins of Cloud Computing Research Malte Schwarzkopf, University of Cambridge Computer Laboratory; Derek G. Murray, Microsoft Research Silicon Valley; Steven Hand, University of Cambridge Computer Laboratory Research into distributed parallelism on “the cloud” has surged lately. As the research agenda and methodology in this area are being established, we observe a tendency towards certain common simplifications and shortcuts employed by researchers, which we provocatively term “sins”. We believe that these sins, in some cases, are threats to the scientific integrity and practical applicability of the research presented. In this paper, we identify and discuss seven “deadly sins” (many of which we have ourselves committed!), present evidence illustrating that they pose real problems, and discuss ways for the community to avoid them in the future. Available Media Icebergs in the Clouds: The Other Risks of Cloud Computing Bryan Ford, Yale University Cloud computing is appealing from management and efficiency perspectives, but brings risks both known and unknown. Well-known and hotly-debated information security risks, due to software vulnerabilities, insider attacks, and side-channels for example, may be only the “tip of the iceberg.” As diverse, independently developed cloud services share ever more fluidly and aggressively multiplexed hardware resource pools, unpredictable interactions between load-balancing and other reactive mechanisms could lead to dynamic instabilities or “meltdowns.” Non-transparent layering structures, where alternative cloud services may appear independent but share deep, hidden resource dependencies, may create unexpected and potentially catastrophic failure correlations, reminiscent of financial industry crashes. Finally, cloud computing exacerbates already-difficult digital preservation challenges, because only the provider of a cloud-based application or service can archive a “live,” functional copy of a cloud artifact and its data for long-term cultural preservation. This paper explores these largely unrecognized risks, making the case that we should study them before our socioeconomic fabric becomes inextricably dependent on a convenient but potentially unstable computing model. Available Media
10:00 a.m.–10:30 a.m.				Tuesday
Break Constitution Foyer
10:30 a.m.–Noon				Tuesday
Cloud Hardware Session Chair: Dilma da Silva, IBM T.J. Watson Research Center Saving Cash by Using Less Cache Timothy Zhu, Anshul Gandhi, and Mor Harchol-Balter, Carnegie Mellon University; Michael A. Kozuch, Intel Labs Everyone loves a large caching tier in their multi-tier cloud-based web service because it both alleviates database load and provides lower request latencies. Even when load drops severely, administrators are reluctant to scale down their caching tier. This paper makes the case that (i) scaling down the caching tier is viable with respect to performance, and (ii) the savings are potentially huge; e.g., a 4x drop in load can result in 90% savings in the caching tier size. Available Media Exploiting Hardware Heterogeneity within the Same Instance Type of Amazon EC2 Zhonghong Ou, Hao Zhuang, Jukka K. Nurminen, and Antti Ylä-Jääski, Aalto University, Finland; Pan Hui, Deutsch Telekom Laboratories, Germany Cloud computing providers might start with near-homogeneous hardware environment. Over time, the homogeneous environment will most likely evolve into heterogeneous one because of possible upgrades and replacement of outdated hardware. In turn, the hardware heterogeneity will result into performance variation. In this paper, we look into the hardware heterogeneity and the corresponding performance variation within the same instance type of Amazon Elastic Compute Cloud (Amazon EC2). Standard large instance is selected as the example. We find out that there exist three different sub-types of hardware configuration in the standard large instance. Through a set of detailed micro-benchmark and application-level benchmark measurements, we observe that the performance variation within the same sub-type of instance is relatively small, whilst the variation between different sub-types can be up to 60%. By selecting better-performing instances to complete the same task, end-users of Amazon EC2 platform can achieve up to 30% cost saving. Available Media RAMCube: Exploiting Network Proximity for RAM-Based Key-Value Store Yiming Zhang, National University of Defense Technology; Chuanxiong Guo, Microsoft Research Asia; Rui Chu, National University of Defense Technology; Guohan Lu, Yongqiang Xiong, and Haitao Wu, Microsoft Research Asia Disk-based storage is becoming increasingly problematic in meeting the needs of large-scale cloud applications. Recently RAM-based storage is proposed by aggregating the RAM of thousands of commodity servers in data center networks (DCN). These studies focus on improving performance with low latency RPC and fast failure recovery. RAM-based storage brings great DCN-related challenges, e.g., false server failure detection due to network problems, traffic congestion during failure recovery, and ToR switch failure handling. This paper presents RAMCube, a DCN-oriented design for RAM-based key-value store based on the BCube network [9]. RAMCube exploits the properties of BCube to restrict all failure detection and recovery traffic within one-hop neighborhood, and leverages BCube’s multiple paths to handle switch failures. Prototype implementation demonstrates that RAMCube is promising to achieve high performance I/O and fast failure recovery in large-scale DCNs. Available Media
Noon–1:00 p.m.				Tuesday
FCW Luncheon Back Bay CD
1:00 p.m.–2:45 p.m.				Tuesday
Networking Session Chair: Theophilus Benson, University of Wisconsin—Madison Opening Up Black Box Networks with CloudTalk Costin Raiciu, Mihail Ionescu, and Dragos Niculescu, University Politehnica of Bucharest Optimizing distributed applications in the cloud requires network topology information, yet this information is kept confidential by the cloud providers. Today, applications can infer network properties and optimize accordingly but this is costly to get right. The cloud can optimize the network via load balancing, but the scope is limited to moving traffic between the available paths. In this paper we challenge this status quo. We show that network topology information is not confidential in the first place by conducting a study of Amazon’s EC2 topology, and we show how such information can have a big impact in optimizing a web search application. We propose that applications and the network should break the silence and communicate to allow better optimizations that will benefit both parties. To this end we design CloudTalk, a very simple language that allows apps to express traffic patterns that are then ranked by the network. The ranking helps application pick the right way to implement its tasks. We provide a proof-of-concept implementation of CloudTalk showing that it is expressive enough to capture many traffic patterns and that it is feasible to use in practice. Available Media GRIN: Utilizing the Empty Half of Full Bisection Networks Alexandru Agache and Costin Raiciu, University Politehnica of Bucharest Various full bisection designs have been proposed for datacenter networks. They are provisioned for the worst case in which every server wishes to send flat out and there is no congestion anywhere in the network. However, these topologies are prone to considerable under-utilization in the average case encountered in practice. To utilize spare bandwidth we propose GRIN, a simple, cheap and easily deployable solution that simply wires up any free ports datacenter servers may have. GRIN allows each server to use up to a maximum amount of bandwidth dependent on its available network ports and the number of idle uplinks in the same rack. This design can be used to augment almost any existing datacenter network, with little initial effort and no additional maintenance costs. Available Media EyeQ: Practical Network Performance Isolation for the Multi-tenant Cloud Vimalkumar Jeyakumar, Mohammad Alizadeh, David Mazières, and Balaji Prabhakar, Stanford University; Changhoon Kim, Windows Azure The shared multi-tenant nature of the cloud has raised serious concerns about its security and performance for high valued services. Of many shared resources like CPU, memory, etc., the network is pivotal for distributed applications. Benign, or perhaps malicious traffic interference between tenants can cause significant performance degradation that hurts performance of applications, and hence, impacts their revenue. Network performance isolation is particularly hard because of the distributed nature of the problem, and the short (few RTT) timescales at which they manifest themselves. This problem is further exacerbated by the large number of competing entities in the cloud, and their volatile traffic patterns. In this paper, we motivate the design of our system called EyeQ, with the goal of providing predictable network performance to tenants. The enabler for EyeQ is the availability of high bisection bandwidth in data centers. The key insight is that by leaving a headroom of (say) 10% of access link bandwidth, EyeQ simplifies dealing with potentially a global contention problem into one that is mostly local, at the sender and receiver. This allows EyeQ to enforce predictable network sharing completely at the end hosts, with minimum support from the physical network. Available Media A Case for Performance-Centric Network Allocation Gautam Kumar, Mosharaf Chowdhury, Sylvia Ratnasamy, and Ion Stoica, University of California, Berkeley We consider the problem of allocating network resources across applications in a private cluster running data-parallel frameworks. Our primary observation is that these applications have different communication requirements and thus require different support from the network to effectively parallelize. We argue that network resources should be shared in a performance-centric fashion that aids parallelism and allows developers to reason about the overall performance of their applications. This paper tries to address the question of whether/how fairness-centric proposals relate to a performance-centric approach for different communication patterns common in these frameworks and engages in a quest for a unified mechanism to share the network in such settings. Available Media
2:45 p.m.–3:15 p.m.				Tuesday
Break Constitution Foyer
3:15 p.m.–4:45 p.m.				Tuesday
Programming Models Session Chair: Michael A. Kozuch, Intel Labs Pittsburgh Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters Matei Zaharia, Tathagata Das, Haoyuan Li, Scott Shenker, and Ion Stoica, University of California, Berkeley Many important “big data” applications need to process data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models that provide fault recovery do so in an expensive manner, requiring either hot replication or long recovery times. We propose a new programming model, discretized streams (D-Streams), that offers a high-level functional programming API, strong consistency, and efficient fault recovery. D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup solutions in streaming databases: parallel recovery of lost state across the cluster. We have prototyped D-Streams in an extension to the Spark cluster computing framework called Spark Streaming, which lets users seamlessly intermix streaming, batch and interactive queries. Available Media Using R for Iterative and Incremental Processing Shivaram Venkataraman, UC Berkeley; Indrajit Roy, Alvin AuYoung, and Robert S. Schreiber, HP Labs It is cumbersome to write complex machine learning and graph algorithms in existing data-parallel models like MapReduce. Many of these algorithms are, by nature, iterative and perform incremental computations, neither of which are efficiently supported by current frameworks. We argue that array-based languages, like R [1], are ideal to express these algorithms, and we should extend these languages for processing in the cloud. In this paper we present the challenges and abstractions to extend R. Early results show that many computations are an order of magnitude faster than processing in Hadoop. Available Media The Resource-as-a-Service (RaaS) Cloud Orna Agmon Ben-Yehuda, Muli Ben-Yehuda, Assaf Schuster, and Dan Tsafrir, Technion—Israel Institute of Technology Over the next few years, a new model of buying and selling cloud computing resources will evolve. Instead of providers exclusively selling server equivalent virtual machines for relatively long periods of time (as done in today’s IaaS clouds), providers will increasingly sell individual resources (such as CPU, memory, and I/O resources) for a few seconds at a time. We term this nascent economic model of cloud computing the Resource-as-a-Service (RaaS) cloud, and we argue that its rise is the likely culmination of recent trends in the construction of IaaS clouds and of the economic forces operating on both providers and clients. Available Media
5:30 p.m.–7:30 p.m.				Tuesday
Tuesday Happy Hour and HotCloud '12 Poster Session Republic Ballroom Whether this is your first time at a USENIX event or your tenth, please stop by the opening Tuesday Happy Hour to start your week off right. Take advantage of this opportunity to meet your peers while enjoying refreshments and hor d'oeuvres and checking out the HotCloud '12 posters. The HotCloud '12 poster session provides an opportunity to present early-stage work and receive feedback from the community. Find out more here.

June 13, 2012

9:00 a.m.–10:30 a.m.				Wednesday
Cloud: Joint Session with USENIX ATC Please see the USENIX ATC Cloud session for details.
10:30 a.m.–11:00 a.m.				Wednesday
Break Constitution Foyer
11:00 a.m.–12:30 p.m.				Wednesday
Storage Session Chair: Marcos K. Aguilera, Microsoft Research Small Is Big: Functionally Partitioned File Caching in Virtualized Environments Zhe Zhang, Han Chen, and Hui Lei, IBM T. J. Watson Research Center File cache management is among the most important factors affecting the performance of a cloud computing system. To achieve higher economies of scale, virtual machines are often overcommitted, which creates high memory pressure. Thus it is essential to eliminate duplicate data in the host and guest caches to boost performance. Existing cache deduplication solutions are based on complex algorithms, or incur high runtime overhead, and therefore are not widely applicable. In this paper we present a simple and lightweight mechanism based on functional partitioning. In our mechanism, the responsibility of each cache becomes smaller. As a result, the overall effective cache size becomes bigger. Our method requires very small change to existing software (15 lines of new/modified code) to achieves big performance improvements – more than 40% performance gains in high memory pressure settings. Available Media Sweet Storage SLOs with Frosting Andrew Wang, Shivaram Venkataraman, Sara Alspaugh, Ion Stoica, and Randy Katz, University of California, Berkeley Modern datacenters support a large number of applica- tions with diverse performance requirements. These performance requirements are expressed at the application layer as high-level service-level objectives (SLOs). However, large-scale distributed storage systems are unaware of these high-level SLOs. This lack of awareness results in poor performance when workloads from multiple applications are consolidated onto the same storage cluster to increase utilization. In this paper, we argue that because SLOs are expressed at a high level, a high-level control mechanism is required. This is in contrast to existing approaches, which use block- or disk-level mechanisms. These require manual translation of high-level requirements into low-level parameters. We present Frosting, a request scheduling layer on top of a distributed storage system that allows application programmers to specify their high-level SLOs directly. Frosting improves over the state-of-the-art by automatically translating high-level SLOs into internal scheduling parameters and uses feedback control to adapt these parameters to changes in the workload. Our preliminary results demonstrate that our overlay approach can multiplex both latency-sensitive and batch applications to increase utilization, while still maintaining a 100ms 99th percentile latency SLO for latency-sensitive clients. Available Media Towards Fair Sharing of Block Storage in a Multi-tenant Cloud Xing Lin, University of Utah, School of Computing; Yun Mao, AT&T Labs—Research; Feifei Li and Robert Ricci, University of Utah, School of Computing A common problem with disk-based cloud storage services is that performance can vary greatly and become highly unpredictable in a multi-tenant environment. A fundamental reason is the interference between workloads co-located on the same physical disk. We observe that different IO patterns interfere with each other significantly, which makes the performance of different types of workloads unpredictable when they are executed concurrently. Unpredictability implies that users may not get a fair share of the system resources from the cloud services they are using. At the same time, replication is commonly used in cloud storage for high reliability. Connecting these two facts, we propose a cloud storage system designed to minimize workload interference without increasing storage costs or sacrificing the overall system throughput. Our design leverages log-structured disk layout, chain replication and a workload-based replica selection strategy to minimize interference, striking a balance between performance and fairness. Our initial results suggest that this approach is a promising way to improve the performance and predictability of cloud storage. Available Media
12:30 p.m.–1:30 p.m.				Wednesday
FCW Luncheon Back Bay CD
1:30 p.m.–3:00 p.m.				Wednesday
Working with Big Data Session Chair: Byung-Gon Chun, Yahoo! Research Big Data Platforms as a Service: Challenges and Approach James Horey, Edmon Begoli, Raghul Gunasekaran, Seung-Hwan Lim, and James Nutaro, Oak Ridge National Laboratory Infrastructure-as-a-Service has revolutionized the manner in which users commission computing infrastructure. Coupled with Big Data platforms (Hadoop, Cassandra), IaaS has democratized the ability to store and process massive datasets. For users that need to customize or create new Big Data stacks, however, readily available solutions do not yet exist. Users must first acquire the necessary cloud computing infrastructure, and manually install the prerequisite software. For complex distributed services this can be a daunting challenge. To address this issue, we argue that distributed services should be viewed as a single application consisting of virtual machines. Users should no longer be concerned about individual machines or their internal organization. To illustrate this concept, we introduce Cloud-Get, a distributed package manager that enables the simple installation of distributed services in a cloud computing environment. Cloud-Get enables users to instantiate and modify distributed services, including Big Data services, using simple commands. Cloud-Get also simplifies creating new distributed services via standardized package definitions. Available Media Why Let Resources Idle? Aggressive Cloning of Jobs with Dolly Ganesh Ananthanarayanan, Ali Ghodsi, Scott Shenker, and Ion Stoica, University of California, Berkeley Despite prior research on outlier mitigation, our analysis of jobs from the Facebook cluster shows that outliers still occur, especially in small jobs. Small jobs are particularly sensitive to long-running outlier tasks because of their interactive nature. Outlier mitigation strategies rely on comparing different tasks of the same job and launching speculative copies for the slower tasks. However, small jobs execute all their tasks simultaneously, thereby not providing sufficient time to observe and compare tasks. Building on the observation that clusters are underutilized, we take speculation to its logical extreme—run full clones of jobs to mitigate the effect of outliers. The heavy-tail distribution of job sizes implies that we can impact most jobs without using much resources. Trace-driven simulations show that average completion time of all the small jobs improves by 47% using cloning, at the cost of just 3% extra resources. Available Media Predicting Execution Bottlenecks in Map-Reduce Clusters Edward Bortnikov, Yahoo! Labs, Haifa, Israel; Ari Frank, Affectivon Inc. Kiryat Tivon, Israel; Eshcar Hillel, Yahoo! Labs, Haifa, Israel; Sriram Rao, Yahoo! Labs, Santa Clara, US Extremely slow, or straggler, tasks are a major performance bottleneck in map-reduce systems. Hadoop infrastructure makes an effort to both avoid them (through minimizing remote data accesses) and handle them in the runtime (through speculative execution). However, the mechanisms in place neither guarantee the avoidance of performance hotspots in task scheduling, nor provide any easy way to tune the timely detection of stragglers. We suggest a machine-learning approach to address these problems, and introduce a slowdown predictor – an oracle to forecast how much slower a task will run on a given node, compared to similar tasks. Slowdown predictors can be embedded in the map-reduce infrastructure to improve the agility and timeliness of scheduling decisions. We provide initial evaluation to demonstrate the viability of our approach, and discuss the use cases for the new paradigm. Available Media
3:00 p.m.–3:15 p.m.				Wednesday
Break Constitution Foyer
3:15 p.m.–4:45 p.m.				Wednesday
Scheduling Session Chair: Livio Soares, IBM Research Dynamic Virtual Machine Scheduling in Clouds for Architectural Shared Resources Jeongseob Ahn, Changdae Kim, and Jaeung Han, KAIST; Young-ri Choi, UNIST; Jaehyuk Huh, KAIST Although virtual machine (VM) migration has been used to avoid conflicts on traditional system resources like CPU and memory, micro-architectural resources such as shared caches, memory controllers, and nonuniform memory access (NUMA) affinity, have only relied on intra-system scheduling to reduce contentions on them. This study shows that live VM migration can be used to mitigate the contentions on micro-architectural resources. Such cloud-level VM scheduling can widen the scope of VM selections for architectural shared resources beyond a single system, and thus improve the opportunity to further reduce possible conflicts. This paper proposes and evaluates two cluster-level virtual machine scheduling techniques for cache sharing and NUMA affinity, which do not require any prior knowledge on the behaviors of VMs. Available Media North by Northwest: Infrastructure Agnostic and Datastore Agnostic Live Migration of Private Cloud Platforms Navraj Chohan, Anand Gupta, Chris Bunch, Sujay Sundaram, and Chandra Krintz, University of California, Santa Barbara Cloud technology is evolving at a rapid pace with innovation occurring throughout the software stack. While updates to Software-as-a-Service (SaaS) products require a simple push of code to the production servers or platform, updates to the Infrastructure-as-a-Service (IaaS) or Platform-as-a-Service (PaaS) layers require more intricate procedures to prevent disruption to services at higher abstraction layers. In this work we address the need for rolling upgrades to PaaS systems. We do so with the AppScale PaaS, which is a multi-application, multi-language, multi-infrastructure, and multi-datastore platform. Our design and implementation allows for applications and tenants to be migrated live from one cloud deployment to another with guaranteed transaction semantics and minimal performance degradation. In this paper we motivate the need for PaaS migration support and empirically evaluate migrations between two AppScale deployments using highly scalable datastores. Available Media Automated Diagnosis Without Predictability Is a Recipe for Failure Raja R. Sambasivan and Gregory R. Ganger, Carnegie Mellon University Automated management is critical to the success of cloud computing, given its scale and complexity. But, most systems do not satisfy one of the key properties required for automation: predictability, which in turn relies upon low variance. Most automation tools are not effective when variance is consistently high. Using automated performance diagnosis as a concrete example, this position paper argues that for automation to become a reality, system builders must treat variance as an important metric and make conscious decisions about where to reduce it. To help with this task, we describe a framework for reasoning about sources of variance in distributed systems and describe an example tool for helping identify them. Available Media
4:45 p.m.–5:00 p.m.				Wednesday
Break Constitution Foyer
5:00 p.m.–6:30 p.m.				Wednesday
Security Session Chair: Sharon Goldberg, Boston University Veriﬁable Computation with Massively Parallel Interactive Proofs Justin Thaler, Mike Roberts, Michael Mitzenmacher, and Hanspeter Pﬁster, Harvard University, School of Engineering and Applied Sciences As the cloud computing paradigm has gained prominence, the need for verifiable computation has grown increasingly urgent. Protocols for verifiable computation enable a weak client to outsource difficult computations to a powerful, but untrusted server, in a way that provides the client with a guarantee that the server performed the requested computations correctly. By design, these protocols impose a minimal computational burden on the client, but they require the server to perform a very large amount of extra bookkeeping to enable a client to easily verify the results. Verifiable computation has thus remained a theoretical curiosity, and protocols for it have not been implemented in real cloud computing systems. In this paper, we assess the potential of parallel processing to help make practical verification a reality, identifying abundant data parallelism in a state-of-the-art general-purpose protocol for verifiable computation. We implement this protocol on the GPU, obtaining 40-120× server-side speedups relative to a state-of-the-art sequential implementation. For benchmark problems, our implementation thereby reduces the slowdown of the server to within factors of 100-500× relative to the original computations requested by the client. Furthermore, we reduce the already small runtime of the client by 100×. Our results demonstrate the immediate practicality of using GPUs for verifiable computation, and more generally, that protocols for verifiable computation have become sufficiently mature to deploy in real cloud computing systems. Available Media vCRIB: Virtualized Rule Management in the Cloud Masoud Moshref and Minlan Yu, University of Southern California; Abhishek Sharma, University of Southern California and NEC; Ramesh Govindan, University of Southern California Cloud operators increasingly need many fine-grained rules to better control individual network flows for various management tasks. While previous approaches have advocated placing rules either on hypervisors or switches, we argue that future data centers would benefit from leveraging rule processing capabilities at both for better scalability and performance. In this paper, we propose vCRIB, a virtualized Cloud Rule Information Base that allows operators to freely define different management policies without the need to consider underlying resource constraints. The challenge in our approach is the design of a vCRIB manager that automatically partitions and places rules at both hypervisors and switches to achieve a good trade-off between resource usage and performance. Available Media Plugging Side-Channel Leaks with Timing Information Flow Control Bryan Ford, Yale University The cloud model’s dependence on massive parallelism and resource sharing exacerbates the security challenge of timing side-channels. Timing Information Flow Control (TIFC) is a novel adaptation of IFC techniques that may offer a way to reason about, and ultimately control, the flow of sensitive information through systems via timing channels. With TIFC, objects such as files, messages, and processes carry not just content labels describing the ownership of the object’s “bits,” but also timing labels describing information contained in timing events affecting the object, such as process creation/termination or message reception. With two system design tools—deterministic execution and pacing queues—TIFC enables the construction of “timing-hardened” cloud infrastructure that permits statistical multiplexing, while aggregating and rate-limiting timing information leakage between hosted computations. Available Media