Empirical Study of the Power Consumption of the x86-64 Instruction Decoder

Mikael Hirki; Zhonghong Ou; Qing Yi; Kashif Nizam Khan; Fatih Alagoz; Jukka K. Nurminen; Tapio Niemi; Ming Zhang; Qingqing Li; Orran Krieger; Ayse K. Coskun

Workshop Program

All sessions will be held in the Bayshore Room unless otherwise noted.

The workshop papers are available for download below. Copyright to the individual works is retained by the author(s).

Attendee Files

CoolDC '16 Attendee List (updated 3/17/16)

Saturday, March 19, 2016

8:00 am–9:00 am	Saturday
Continental Breakfast
9:00 am–10:00 am	Saturday
Session Chair: Thomas Wenisch, University of Michigan Keynote Address 9:00 am-10:00 am Luiz André Barroso, Google Luiz André Barroso is a Google Fellow. His interests range from distributed system software infrastructure to the design of Google's computing platform. Prior to Google, Luiz was a member of the research staff at Digital Equipment Corporation and Compaq, where his group did some of the pioneering work on multi-core architectures. He has B.S. and M.S. degrees in Electrical Engineering from the Pontifícia Universidade Católica of Rio de Janeiro, and a Ph.D. in Computer Engineering from the University of Southern California. Luiz is a Fellow of the ACM and the AAAS. Luiz André Barroso is a Google Fellow. His interests range from distributed system software infrastructure to the design of Google's computing platform. Prior to Google, Luiz was a member of the research staff at Digital Equipment Corporation and Compaq, where his group did some of the pioneering work on multi-core architectures. He has B.S. and M.S. degrees in Electrical Engineering from the Pontifícia Universidade Católica of Rio de Janeiro, and a Ph.D. in Computer Engineering from the University of Southern California. Luiz is a Fellow of the ACM and the AAAS. Luiz André Barroso is a Google Fellow. His interests range from distributed system software infrastructure to the design of Google's computing platform. Prior to Google, Luiz was a member of the research staff at Digital Equipment Corporation and Compaq, where his group did some of the pioneering work on multi-core architectures. He has B.S. and M.S. degrees in Electrical Engineering from the Pontifícia Universidade Católica of Rio de Janeiro, and a Ph.D. in Computer Engineering from the University of Southern California. Luiz is a Fellow of the ACM and the AAAS. Read more about Keynote Address
10:00 am–10:30 am	Saturday
Break with Refreshments
10:30 am–11:00 am	Saturday
Session Chair: Weisong Shi, Wayne State University Report from 2015 NSF Workshop on Sustainable Data Centers Thomas F. Wenisch, University of Michigan Read more about Report from 2015 NSF Workshop on Sustainable Data Centers
11:00 am–12:30 pm	Saturday
Measurement and Monitoring Session Chair: Weisong Shi, Wayne State University A New Perspective on Energy Accounting in Multi-Tenant Data Centers Mohammad A. Islam and Shaolei Ren, University of California, Riverside Energy accounting plays a crucial role in optimizing data center energy efficiency. Nonetheless, in a multi-tenant data center, it is challenging to fairly account for non- IT energy on an individual tenant basis, because each non-IT system (e.g., power supply and cooling) is shared by multiple tenants and only the system-level non-IT energy consumption can be measured. Existing policies, e.g., proportionally or equally attributing non-IT energy to tenants, may attribute different energy to two “equivalent” tenants and hence are not fair. In this paper, we propose QSEA, a quick Shapley value-based energy accounting policy for multi-tenant data centers. QSEA is provably fair and also easy to implement with little to zero overhead. We run trace-based simulations and demonstrate that, compared to the exact Shapley value approach that has an exponential complexity, QSEA yields almost the same energy accounting result while having a negligible computation time. Available Media Seeing into a Public Cloud: Monitoring the Massachusetts Open Cloud 11:30 am-12:00 pm Ata Turk, Hao Chen, Ozan Tuncer, Hua Li, Qingqing Li, Orran Krieger, and Ayse K. Coskun, Boston University Cloud users today have little visibility into the performance characteristics, power consumption, and utilization of cloud resources; and the cloud has little visibility into user application performance requirements and critical metrics such as response time and throughput. This paper outlines new efforts to reduce the information gap between the cloud users and the cloud. We first present a scalable monitoring platform to collect and retain rich information on a regional public cloud. Second, we present two motivating use cases that leverage the collected information: (1) Participation in emerging smart grid demand response programs in order to reduce datacenter energy costs and stabilize power grid demands, (2) budgeting available power to applications via peak shaving. This work is done in the context of the Massachusetts Open Cloud (MOC), a new public cloud project that has a central goal of enabling cloud research. Available Media Trinity Facilities and Operations Planning and Preparation: Early Experiences, Successes, and Lessons Learned 12:00 pm-12:30 pm Ron Velarde, Carolyn Connor, Alynna Montoya-Wiuff, and Cindy Martin, Los Alamos National Laboratory There is considerable interest in achieving a 1000 fold increase in supercomputing power in the next decade, but the challenges are formidable. The need to significantly decrease power usage and drastically increase energy efficiency has become pervasive in the high performance computing community, extending from chip design to data center design and operations. In this paper the authors present a short summary of early experience, successes, and lessons learned with respect to facilities, operations, and monitoring of the New Mexico Alliance for Computing at Extreme Scale (ACES), a collaboration between Los Alamos National Laboratory (LANL) and Sandia National Laboratories (SNL), Trinity Supercomputer during the facility preparation and pre-acceptance testing phases of the project. The Trinity Supercomputer, which is designed to exceed 40 Petaflops/s, is physically located at Los Alamos’ Strategic Computing Center (SCC) and is a next step toward the goal of exascale computing (a million, trillion operations per second). Discussion topics include facilities infrastructure upgrades, Sanitary Effluent Reclamation Facility (SERF) water use, adaptive design and installation approaches, scalability and stability of monitoring systems, and early power-capping investigation results. Available Media
12:30 pm–2:00 pm	Saturday
Luncheon for Workshop Attendees
2:00 pm–3:30 pm	Saturday
Reduce the Waste Session Chair: Thomas Wenisch, University of Michigan Hunt for Unused Servers 2:00 pm-2:30 pm Nikolai Joukov and Vladislav Shorokhov, modelizeIT Inc Modern enterprises critically depend on IT. However, enterprise IT environments are complex and poorly documented. As a result, various IT components are forgotten and unused. Recent estimates show that 30% of servers in datacenters are unused on average. Elimination of unused servers (physical and virtual) and software instances decreases costs, electricity consumption, risks of problems causing business interruptions, and security exposures. IT components form complex interdependent graphs. It is intuitive to declare the nodes not used by other nodes to be unused. However, the reality is a lot more complex: servers (even if unused) are highly inter-connected. This paper has two main contributions. 1) We present a practical method to detect unused servers based on the dependencies graphs. The method relies on the dependencies classification and propagation of usage information along the graph. 2) We apply and evaluate the topological method and utilization-based approaches for real enterprise datacenters. We benchmark and compare both methods in terms of detection error rates. Available Media Reducing Execution Waste in Priority Scheduling: a Hybrid Approach 2:30 pm-3:00 pm Derya Çavdar, Bogazici University; Lydia Y. Chen, IBM Research Zürich Lab; Fatih Alagöz, Bogazici University Guaranteeing quality for differentiated services while ensuring resource efficiency is an important and yet challenging problem in large computing clusters. Priority scheduling is commonly adopted in production systems to minimize the response time of high-priority workload by means of preempting the execution of low-priority workload when faced with limited resources. As a result, the system performance may not only suffer from the long queueing time of low-priority workload due to resource starvation, but also non-negligible execution waste owing to repetitive evictions. In this paper, we propose a scheduler, HYBRID, which allows the scheduler to switch between being preemptive and non-preemptive by providing a fixed number of computing resources – sticky slots providing uninterruptible task executions. In addition, to preserve performance advantages of highpriority workload by conventional preemptive priority scheduling, HYBRID also aims to reduce repetitive evictions, response times, and wasted executions caused by the low-priority workload. Trace driven simulation analysis shows that our proposed HYBRID scheduler outperforms conventional preemptive priority scheduling by improving response time of low-priority workload by 15% and reducing wasted executions by 85%. Available Media Saving on Data Center Energy Bills with EDEALS: Electricity Demand-response Easy Adjusted Load Shifting 3:00 pm-3:30 pm Will McFadden, University of Chicago; Anita Nikolich, Morgridge Institute; Ray Parpart and Dr. Birali Runesha, University of Chicago Energy demand response presents a highly cost-effective means to improve the sustainability of data centers whenever there is flexibility in task scheduling. This paper presents an empirical study in the area of data center demand response, with the goal of cost savings on electricity bills for small to medium size data centers drawing 1-5 MegaWatts. Using the SLURM resource manager, we demonstrate a methodology for energy aware load shifting by flexibly reducing compute cycles at times of peak energy demand. Simply reducing the pool of available servers during a brief period of peak energy demand results in tangible cost savings through reduced power consumption of the server cluster, with minimal performance degradation to users. We have developed a data processing pipeline, EDEALS, to determine the potential cost savings of partial data center shutdown to enable demand-response load shifting. As our baseline, we measured the power draw and job scheduling delay of a small-scale test cluster by varying available resources. We then model a production cluster’s performance in response to a realistic energy constraint imposed by a utility provider. For our data center, we quantify a potential annual cost savings on electricity of 7% while only causing 16 hours of total increased wait time (0.1%) throughout the entire year. Available Media
3:30 pm–4:00 pm	Saturday
Break with Refreshments
4:00 pm–5:30 pm	Saturday
Architectural Support Session Chair: Shaolei Ren, University of California, Riverside Understanding the Impact of Cache Locations on Storage Performance and Energy Consumption of Virtualization Systems 4:00 pm Tao Lu, Ping Huang, and Xubin He, Virginia Commonwealth University; Ming Zhang, EMC Corporation As per-server CPU cores and DRAM capacity continuously increase, application density of virtualization platforms hikes. High application density imposes tremendous pressure on storage systems. Layers of caches are deployed to improve storage performance. Owing to its manageability and transparency advantages, hypervisorside caching is widely employed. However, hypervisorside caches locate at the lower layer of VM disk filesystems. Thus, the critical path of cache access involves the virtual I/O sub-path, which is expensive in operation latency and CPU cycles. The virtual I/O sub-path caps the throughput (IOPS) of the hypervisor-side cache and incurs additional energy consumption. It’s viable to directly allocate spare cache resources such as DRAM of a host machine to a VM for building a VM-side cache so as to obviate the I/O virtualization overheads. In this work, we quantitatively compare the performance and energy efficiency of VM-side and hypervisor-side caches based on DRAM, SATA SSD, and PCIe SSD devices. Insights of this work can direct designs of high-performance and energy-efficient virtualization systems in the future. Available Media Using Memory-style Storage to Support Fault Tolerance in Data Centers 4:30 pm-5:00 pm Xiao Liu, University of California, Santa Cruz; Qing Yi, University of Colorado at Colorado Springs; Jishen Zhao, University of California, Santa Cruz Next-generation nonvolatile memories combine byteaddressability and high performance of memory with nonvolatility of disk/flash. They promise emerging memory-style storage (MSS) systems that are directly attached to the memory bus, offering fast load/store access and data persistence in a single level of storage. MSS can be especially attractive in data centers, where fault tolerance support through storage systems is critical to performance and energy. Yet existing fault tolerance mechanisms, such as logging and checkpointing, are designed for slow block-level storage interfaces; their design choices are not wholly suitable for MSS. The goal of this work is to explore efficient fault tolerance techniques that exploit the fast memory interface and the nature of single-level storage. Our priliminary exploration shows that, by reducing data duplication and increasing application parallelism, such techniques can substantially improve system performance and energy consumption. Available Media Empirical Study of the Power Consumption of the x86-64 Instruction Decoder 5:00 pm-5:30 pm Mikael Hirki, Helsinki Institute of Physics and Aalto University; Zhonghong Ou, Beijing University of Posts and Telecommunications; Kashif N. Khan, Helsinki Institute of Physics and Aalto University; Jukka K. Nurminen, Helsinki Institute of Physics, Aalto University, and VTT Research; Tapio Niemi, Helsinki Institute of Physics It has been a common myth that x86-64 processors suffer in terms of energy efficiency because of their complex instruction set. In this paper, we aim to investigate whether this myth holds true, and determine the power consumption of the instruction decoders of an x86-64 processor. To that end, we design a set of microbenchmarks that specifically trigger the instruction decoders by exceeding the capacity of the decoded instruction cache. We measure the power consumption of the processor package using a hardware-level energy metering model called the Running Average Power Limit (RAPL), which is supported in the latest Intel architectures. We leverage linear regression modeling to break down the power consumption of each processor component, including the instruction decoders. Through a comprehensive set of experiments, we demonstrate that the instruction decoders can consume between 3% and 10% of the package power when the capacity of the decoded instruction cache is exceeded. Overall, this is a somewhat limited amount of power compared with the other components in the processor core, e.g., the L2 cache. We hope our finding can shed light on the future optimization of processor architectures. Available Media

Workshop Program

Saturday, March 19, 2016

Continental Breakfast

Break with Refreshments

Luncheon for Workshop Attendees

Break with Refreshments