sponsors
help promote
Get more
Help Promote graphics!
usenix conference policies
Making Sense of Performance in Data Analytics Frameworks
Kay Ousterhout, University of California, Berkeley; Ryan Rasti, University of California, Berkeley, International Computer Science Institute, and VMware; Sylvia Ratnasamy, University of California, Berkeley; Scott Shenker, University of California, Berkeley, and International Computer Science Institute; Byung-Gon Chun, Seoul National University
There has been much research devoted to improving the performance of data analytics frameworks, but comparatively little effort has been spent systematically identifying the performance bottlenecks of these systems. In this paper, we develop blocked time analysis, a methodology for quantifying performance bottlenecks in distributed computation frameworks, and use it to analyze the Spark framework’s performance on two SQL benchmarks and a production workload. Contrary to our expectations, we find that (i) CPU (and not I/O) is often the bottleneck, (ii) improving network performance can improve job completion time by a median of at most 2%, and (iii) the causes of most stragglers can be identified.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Kay Ousterhout and Ryan Rasti and Sylvia Ratnasamy and Scott Shenker and Byung-Gon Chun},
title = {Making Sense of Performance in Data Analytics Frameworks},
booktitle = {12th USENIX Symposium on Networked Systems Design and Implementation (NSDI 15)},
year = {2015},
isbn = {978-1-931971-218},
address = {Oakland, CA},
pages = {293--307},
url = {https://www.usenix.org/conference/nsdi15/technical-sessions/presentation/ousterhout},
publisher = {USENIX Association},
month = may
}
connect with us