Alexander Visheratin, Alexey Struckov, Semen Yufa, Alexey Muratov, Denis Nasonov, and Nikolay Butakov, ITMO University; Yury Kuznetsov and Michael May, Siemens
The rapid development of scientific and industrial areas, which rely on time series data processing, raises the demand for storage that would be able to process tens and hundreds of terabytes of data efficiently. And by efficiency, one should understand not only the speed of data processing operations execution but also the volume of the data stored and operational costs when deploying the storage in a production environment such as the cloud. In this paper, we propose a concept for storing and indexing numerical time series that allows creating compact data representations optimized for cloud storages and perform typical operations - uploading, extracting, sampling, statistical aggregations, and – at high speed. Our modular database that implements the proposed approach – Peregreen – can achieve a throughput of 3 million entries per second for uploading and 48 million entries per second for extraction in Amazon EC2 while having only Amazon S3 as backend storage for all the data.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Alexander Visheratin and Alexey Struckov and Semen Yufa and Alexey Muratov and Denis Nasonov and Nikolay Butakov and Yury Kuznetsov and Michael May},
title = {Peregreen {\textendash} modular database for efficient storage of historical time series in cloud environments},
booktitle = {2020 USENIX Annual Technical Conference (USENIX ATC 20)},
year = {2020},
isbn = {978-1-939133-14-4},
pages = {589--601},
url = {https://www.usenix.org/conference/atc20/presentation/visheratin},
publisher = {USENIX Association},
month = jul
}