sponsors
usenix conference policies
LOOM: Optimal Aggregation Overlays for In-Memory Big Data Processing
William Culhane, Kirill Kogan, Chamikara Jayalath, and Patrick Eugster, Purdue University
Aggregation underlies the distillation of information from big data. Many well-known basic operations including top-k matching and word count hinge on fast aggregation across large data-sets. Common frameworks including MapReduce support aggregation, but do not explicitly consider or optimize it. Optimizing aggregation however becomes yet more relevant in recent “online” approaches to expressive big data analysis which store data in main memory across nodes. This shifts the bottlenecks from disk I/O to distributed computation and network communication and significantly increases the impact of aggregation time on total job completion time.
This paper presents LOOM, a (sub)system for efficient big data aggregation for use within big data analysis frameworks. LOOM efficiently supports two-phased (sub)computations consisting in a first phase performed on individual data sub-sets (e.g., word count, top-k matching) followed by a second aggregation phase which consolidates individual results of the first phase (e.g., count sum, top-k). Using characteristics of an aggregation function, LOOM constructs a specifically configured aggregation overlay to minimize aggregation costs. We present optimality heuristics and experimentally demonstrate the benefits of thus optimizing aggregation overlays using microbenchmarks and real world examples.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {William Culhane and Kirill Kogan and Chamikara Jayalath and Patrick Eugster},
title = {{LOOM}: Optimal Aggregation Overlays for {In-Memory} Big Data Processing},
booktitle = {6th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 14)},
year = {2014},
address = {Philadelphia, PA},
url = {https://www.usenix.org/conference/hotcloud14/workshop-program/presentation/culhane},
publisher = {USENIX Association},
month = jun
}
connect with us