Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance

Authors: 

Zhichao Cao, University of Minnesota; Shiyong Liu, Ocean University of China; Fenggang Wu, University of Minnesota; Guohua Wang, South China University of Technology; Bingzhe Li and David H.C. Du, University of Minnesota

Abstract: 

Data deduplication is an effective way of improving storage space utilization. The data generated by deduplication is persistently stored in data chunks or data containers (a container consisting of a few hundreds or thousands of data chunks). The data restore process is rather slow due to data fragmentation and read amplification. To speed up the restore process, data chunk rewrite (a rewrite is to store a duplicate data chunk) schemes have been proposed to effectively improve data chunk locality and reduce the number of container reads for restoring the original data. However, rewrites will decrease the deduplication ratio since more storage space is used to store the duplicate data chunks.

To remedy this, we focus on reducing the data fragmentation and read amplification of container-based deduplication systems. We first propose a flexible container referenced count based rewrite scheme, which can make a better tradeoff between the deduplication ratio and the number of required container reads than that of capping which is an existing rewrite scheme. To further improve the rewrite candidate selection accuracy, we propose a sliding look-back window based design, which can make more accurate rewrite decisions by considering the caching effect, data chunk localities, and data chunk closeness in the current and future windows. According to our evaluation, our proposed approach can always achieve a higher restore performance than that of capping especially when the reduction of deduplication ratio is small.

FAST '19 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {227804,
author = {Zhichao Cao and Shiyong Liu and Fenggang Wu and Guohua Wang and Bingzhe Li and David H.C. Du},
title = {Sliding {Look-Back} Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance},
booktitle = {17th USENIX Conference on File and Storage Technologies (FAST 19)},
year = {2019},
isbn = {978-1-939133-09-0},
address = {Boston, MA},
pages = {129--142},
url = {https://www.usenix.org/conference/fast19/presentation/cao},
publisher = {USENIX Association},
month = feb
}

Presentation Video