8:00 am–9:00 am |
Monday |
Continental Breakfast
Mezzanine East/West
|
8:45 am–9:00 am |
Monday |
Program Co-Chairs: Irfan Ahmad, CloudPhysics, and Tim Kraska, Brown University
|
9:00 am–10:40 am |
Monday |
Session Chairs: Irfan Ahmad, CloudPhysics and Tim Kraska, Brown University
Alexandru Agache and Costin Raiciu, University Politehnica of Bucharest Datacenter networks have evolved from simple trees to multi-rooted tree topologies such as FatTree or VL2 that provide many paths between any pair of servers to ensure high performance under all traffic patterns. The standard way to load balance traffic across these links is Equal Cost Multipathing that randomly places flows on paths. ECMP may wrongly place multiple flows on the same congested link, wasting as much as 60% of total capacity in a worst case scenarios for FatTree networks. These networks need information about the traffic they route to avoid collisions, by steering it towards idle paths, or by creating more capacity on the fly between groups of hot racks. Additionally, ECMP creates uncertainty about the path a given flow has taken, making network debugging difficult.
Luo Mai, Imperial College London; Chuntao Hong and Paolo Costa, Microsoft Research To cope with the ever growing availability of training data, there have been several proposals to scale machine learning computation beyond a single server and distribute it across a cluster. While this enables reducing the training time, the observed speed up is often limited by network bottlenecks.
To address this, we design MLNET, a host-based communication layer that aims to improve the network performance of distributed machine learning systems. This is achieved through a combination of traffic reduction techniques (to diminish network load in the core and at the edges) and traffic management (to reduce average training time). A key feature of MLNET, is its compatibility with existing hardware and software infrastructure so it can be immediately deployed.
We describe the main techniques underpinning MLNET, and show through simulation that the overall training time can be reduced by up to 78%. While preliminary, our results indicate the critical role played by the network and the benefits of introducing a new communication layer to increase the performance of distributed machine learning systems.
Zahra Abbasi, Ming Xia M, Meral Shirazipour, and Attila Takacs, Ericsson Research Still not long ago operators were struggling with middlebox deployment and traffic management across them. The service chaining problem was a well studied subject which had to deal with the limitations of middleboxes and offer various techniques to overcome them to achieve the desired traffic steering. Only two years have passed its official launch by ETSI, but network function virtualization (NFV) has already revolutionized the telecom industry by proposing a complete design paradigm shift in the way middleboxes are built and deployed. NFV requires the virtualization of middleboxes and other networking equipment, called virtual network functions (vNFs). This requirement will allow networking infrastructure operators to benefit from the same economies of scale and flexibility than the information technology community experienced with the advent of cloud computing. Other than the capex/opex saving and faster time to market of new services, the cloudification of networking gives us the opportunity to rethink how networking equipment are designed and deployed.
Yiting Xia, Rice University; Mike Schlansker, HP Labs; T. S. Eugene Ng, Rice University; Jean Tourrilhes, HP Labs Most data centers deploy fixed network topologies. This brings difficulties to traffic optimization and network management, because bandwidth locked up in fixed links is not adjustable to traffic needs, and changes of network equipments require cumbersome rewiring of existing links. We believe the solution is to introduce topological flexibility that allows dynamic cable rewiring in the network. We design the OmniSwitch prototype architecture to realize this idea. It uses inexpensive small optical circuit switches to configure cable connections, and integrates them closely with Ethernet switches to provide large-scale connectivity. We design an example control algorithm for a traffic optimization use case, and demonstrate the power of topological flexibility using simulations. Our solution is effective in provisioning bandwidth for cloud tenants and reducing transmission hop count at low computation cost.
|
10:40 am–11:15 am |
Monday |
Break with Refreshments
Mezzanine East/West
|
11:15 am–12:30 pm |
Monday |
Session Chair: Xiaojun Liu, CloudPhysics
Husheng Zhou, Yangchun Fu, and Cong Liu, The University of Texas at Dallas Graphics processing units (GPUs) have been adopted by major cloud vendors, as GPUs provide ordersof- magnitude speedup for computation-intensive dataparallel applications. In the cloud, efficiently sharing GPU resources among multiple virtual machines (VMs) is not so straightforward. Recent research has been conducted to develop GPU virtualization technologies, making it feasible for VMs to share GPU resources in a reliable manner. This paper seeks to improve the efficiency of sharing GPU resources in the cloud for accelerating general-purpose workloads. Our key observation is that redundant GPU computation requests are being seen in many GPU-accelerated workloads in the cloud, such as cloud gaming where multiple clients playing the same game call GPUs to perform physics simulation. We have measured this redundancy using a gaming case study, and found that more than 24% (47%) of the GPU computation requests called by the same VM (multiple VMs) are identical. To exploit this redundancy, we present GRU (GPU Result re-Use), a GPU sharing, result memoization and reuse ecosystem in a cloud environment. GRU transparently enables VMs in the cloud to share a single GPU efficiently, and memoizes GPU computation results for reuse. It leverages the GPU full-virtualization technology, which enables GPU result memoization and reuse without modification of existing device drivers and operating systems. We have implemented GRU on top of the Xen hypervisor. Preliminary experiments show that GRU is able to achieve a significant speedup of up to 18 times compared to the state-of-the-art GPU virtualization framework, while adding a rather small amount of runtime overheads.
Du Su and Yi Lu, University of Illinois at Urbana-Champaign Energy consumption has become a significant fraction of the total cost of ownership of data centers. While much work has focused on improving power efficiency per unit of computation, little attention has been paid to power delivery, which currently wastes 10-20% of total energy consumption even before any computation takes place. A new power delivery architecture using series- stacked servers has recently been proposed in the power community. However, the reduction in power loss depends on the difference in power consumption of the series-stacked servers: The more balanced the computation loads, the more reduction in power conversion loss.
In this preliminary work, we implemented GreenMap, a modified MapReduce framework that assigns tasks in synchronization, and computed the conversion loss based on the measured current profile. At all loads, GreenMap achieves 81x-138x reduction in power conversion loss from the commercial-grade high voltage converter used by data centers, which is equivalent to 15% reduction in total energy consumption. The average response time of GreenMap suffers no degradation when load reaches 0.6 and above, but at loads below 0.6, the response time suffers a 26-42% increase due to task synchronization. For the low-load region, we describe the use of GreenMap with dynamic scaling to achieve a favorable tradeoff between response time and power efficiency.
Filipe Manco, Joao Martins, Kenichi Yasukata, Jose Mendes, Simon Kuenzer, and Felipe Huici, NEC Europe Ltd. The confluence of a number of relatively recent trends including the development of virtualization technologies, the deployment of micro datacenters at PoPs, and the availability of microservers, opens up the possibility of evolving the cloud, and the network it is connected to, towards a superfluid cloud: a model where parties other than infrastructure owners can quickly deploy and migrate virtualized services throughout the network (in the core, at aggregation points and at the edge), enabling a number of novel use cases including virtualized CPEs and on-the-fly services, among others.
Towards this goal, we identify a number of required mechanisms and present early evaluation results of their implementation. On an inexpensive commodity server, we are able to concurrently run up to 10,000 specialized virtual machines, instantiate a VM in as little as 10 milliseconds, and migrate it in under 100 milliseconds.
|
12:30 pm–2:00 pm |
Monday |
Luncheon for Workshop Attendees
Terra Courtyard
|
2:00 pm–3:30 pm |
Monday |
|
3:30 pm–4:00 pm |
Monday |
Break with Refreshments
Mezzanine East/West
|
4:00 pm–5:40 pm |
Monday |
Session Chair: Swaminathan Sundararaman, SanDisk
Riza O. Suminto, University of Chicago; Agung Laksono and Anang D. Satria, Surya University; Thanh Do, Microsoft Gray Systems Lab; Haryadi S. Gunawi, University of Chicago Modern distributed systems ("cloud systems") have emerged as a dominant backbone for many today's applications. They come in different forms such as scale-out file systems, key-value stores, computing frameworks, synchronization and cluster management services. As these systems collectively become the "cloud operating system", users expect high dependability including performance stability. Unfortunately, the complexity of the software and environment in which they must run has outpaced existing testing and debugging tools. Cloud systems must run at scale with different topologies, execute complex distributed protocols, face load fluctuations and a wide range of hardware faults, and serve users with diverse job characteristics.
One type of important failures is performance failures, a situation where a system (e.g., Hadoop) does not deliver the expected performance (e.g., a job takes 10x longer time than usual). Conversation with cloud engineers reflects that performance stability is often more important than performance optimization; when performance failures happen, users are frustrated, systems waste and underutilize resources, and long debugging efforts are required to find and fix the problems. Sadly, performance failures are still common; our previous work shows that 22% of vital issues reported by cloud system developers relate to performance bugs.
In this paper, our focus is to answer the following three questions: What is the root-cause anatomy of performance bugs that appear in cloud systems? What is missing within the state of the art of detecting performance bugs? What are new novel directions that can prevent performance failures to happen in the field?
David Goldberg and Yinan Shan, eBay The theme of this paper is that anomaly detection splits into two parts: developing the right features, and then feeding these features into a statistical system that detects anomalies in the features. Most literature on anomaly detection focuses on the second part. Our goal is to illustrate the importance of the first part. We do this with two real-life examples of anomaly detectors in use at eBay.
Ricardo Koller and Canturk Isci, IBM T. J. Watson Research Center; Sahil Suneja and Eyal de Lara, University of Toronto Modern cloud applications are distributed across a wide range of instances of multiple types, including virtual machines, containers, and baremetal servers. Traditional approaches to monitoring and analytics fail in these complex, distributed and diverse environments. They are too intrusive and heavy-handed for short-lived, lightweight cloud instances, and cannot keep up with rapid the pace of change in the cloud with continuous dynamic scheduling, provisioning and auto-scaling. We introduce a unified monitoring and analytics architecture designed for the cloud. Our approach leverages virtualization and containerization to decouple monitoring from instance execution and health. Moreover, it provides a uniform view of systems regardless of instance type, and operates without intervening with the end-user context. We describe an implementation of our approach in an actual deployment, and discuss our experiences and observed results.
James Snee, Lucian Carata, Oliver R. A. Chick, Ripduman Sohan, Ramsey M. Faragher, Andrew Rice, and Andy Hopper, University of Cambridge Applications executing on a hypervisor or in a container experience a lack of performance isolation from other services executing on shared resources. Latencysensitive applications executing in the cloud therefore have highly-variable response times, yet attributing the additional latency caused by virtualization overheads on individual requests is an unsolved problem.
We present Soroban, a framework for attributing latency to either the cloud provider or their customer. Soroban allows developers to instrument applications, such as web servers to determine, for each request, how much of the latency is due to the cloud provider, and how much is due to the consumer’s application or service. With this support Soroban enables cloud-providers to provision based on acceptable-latencies, adopt finegrained charging levels that reflect latency demands of users and attribute performance anomalies to either the cloud provider or their consumer. We apply Soroban to a HTTP server and show that it identifies when the cause of latency is due to a provider-induced activity, such as underprovisioning a host, or due to the software run by the customer.
|
6:00 pm–7:00 pm |
Monday |
Joint Poster Session and Happy Hour with HotStorage
Mezzanine East/West
|