Gennady Pekhimenko, University of Toronto; Chuanxiong Guo, Bytedance Inc.; Myeongjae Jeon, Microsoft Research; Peng Huang, Johns Hopkins University; Lidong Zhou, Microsoft Research
This work is the first systematic investigation of stream processing with data compression: we have not only identified a set of factors that influence the benefits and overheads of compression, but have also demonstrated that compression can be effective for stream processing, both in the ability to process in larger windows and in throughput. This is done through a series of (i) optimizations on a stream engine itself to remove major sources of inefficiency, which leads to an order-of-magnitude improvement in throughput (ii) optimizations to reduce the cost of (de)compression, including hardware acceleration, and (iii) a new technique that allows direct execution on compressed data, that leads to a further 50% improvement in throughout. Our evaluation is performed on several real-world scenarios in cloud analytics and troubleshooting, with both microbenchmarks and production stream processing systems.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Gennady Pekhimenko and Chuanxiong Guo and Myeongjae Jeon and Peng Huang and Lidong Zhou},
title = {{TerseCades}: Efficient Data Compression in Stream Processing},
booktitle = {2018 USENIX Annual Technical Conference (USENIX ATC 18)},
year = {2018},
isbn = {978-1-939133-01-4},
address = {Boston, MA},
pages = {307--320},
url = {https://www.usenix.org/conference/atc18/presentation/pekhimenko},
publisher = {USENIX Association},
month = jul
}