Mingxing Zhang, Yongwei Wu, and Kang Chen, Tsinghua University; Xuehai Qian, University of Southern California; Xue Li and Weimin Zheng, Tsinghua University
Task partitioning of a graph-parallel system is traditionally considered equivalent to the graph partition problem. Such equivalence exists because the properties associated with each vertex/edge are normally considered indivisible. However, this assumption is not true for many Machine Learning and Data Mining (MLDM) problems: instead of a single value, a vector of data elements is defined as the property for each vertex/edge. This feature opens a new dimension for task partitioning because a vertex could be divided and assigned to different nodes.
To explore this new opportunity, this paper presents 3D partitioning, a novel category of task partition algorithms that significantly reduces network traffic for certain MLDM applications. Based on 3D partitioning, we build a distributed graph engine CUBE. Our evaluation results show that CUBE outperforms state-of-the-art graph-parallel system PowerLyra by up to 4:7x (up to 7:3x speedup against PowerGraph).
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Mingxing Zhang and Yongwei Wu and Kang Chen and Xuehai Qian and Xue Li and Weimin Zheng},
title = {Exploring the Hidden Dimension in Graph Processing},
booktitle = {12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16)},
year = {2016},
isbn = {978-1-931971-33-1},
address = {Savannah, GA},
pages = {285--300},
url = {https://www.usenix.org/conference/osdi16/technical-sessions/presentation/zhang-mingxing},
publisher = {USENIX Association},
month = nov
}