sponsors
help promote
usenix conference policies
On The [Ir]relevance of Network Performance for Data Processing
Animesh Trivedi, Patrick Stuedi, Jonas Pfefferle, Radu Stoica, Bernard Metzler, Ioannis Koltsidas, and Nikolas Ioannou, IBM Research, Zurich
Modern data processing frameworks are used in a variety of settings for a diverse set of workloads such as sorting, indexing, iterative computations, structured query processing, etc. As these frameworks run in a distributed environment, a natural question to ask is โ how important is the network to the performance of these frameworks? Recent research in this field has led to contradictory results. One camp advocates the limited impact of networking performance on the overall performance of the framework. On the other hand, there is a large body of work on networking optimizations for data processing frameworks.
In this paper, we search for a better understanding of the matter. While answering the basic question concerning the importance of the network performance, our analysis raises new questions and points to previously unexplored or unnoticed avenues for performance optimizations. We take Apache Spark as a representative of a modern data-processing framework. However, to broaden the scope of our investigation, we also experiment with other frameworks such as Flink, Power- Graph or Timely. In our study โ rather than analysing Spark-specific peculiarities โ we look into procedures and subsystems that are common in any of these frameworks such as networking IO, shuffle data management, object (de)serialization, copies, job scheduling and coordination, etc. Nonetheless, we are aware that the roles of those individual components are different for the various systems, and we exercise caution when making generalized statements about the performance.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Animesh Trivedi and Patrick Stuedi and Jonas Pfefferle and Radu Stoica and Bernard Metzler and Ioannis Koltsidas and Nikolas Ioannou},
title = {On The {[Ir]relevance} of Network Performance for Data Processing},
booktitle = {8th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 16)},
year = {2016},
address = {Denver, CO},
url = {https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/trivedi},
publisher = {USENIX Association},
month = jun
}
connect with us