sponsors
help promote
usenix conference policies
You are here
FastCDC: A Fast and Efficient Content-Defined Chunking Approach for Data Deduplication
Wen Xia, Huazhong University of Science and Technology and Sangfor Technologies Co., Ltd.; Yukun Zhou, Huazhong University of Science and Technology; Hong Jiang, University of Texas at Arlington; Dan Feng, Yu Hua, Yuchong Hu, Yucheng Zhang, and Qing Liu, Huazhong University of Science and Technology
Content-Defined Chunking (CDC) has been playing a key role in data deduplication systems in the past 15 years or so due to its high redundancy detection abil- ity. However, existing CDC-based approaches introduce heavy CPU overhead because they declare the chunk cut- points by computing and judging the rolling hashes of the data stream byte by byte. In this paper, we pro- pose FastCDC, a Fast and efficient CDC approach, that builds and improves on the latest Gear-based CDC ap- proach, one of the fastest CDC methods to our knowl- edge. The key idea behind FastCDC is the combined use of three key techniques, namely, simplifying and enhanc- ing the hash judgment to address our observed challenges facing Gear-based CDC, skipping sub-minimum chunk cut-point to further speed up CDC, and normalizing the chunk-size distribution in a small specified region to ad- dress the problem of the decreased deduplication ratio stemming from the cut-point skipping. Our evaluation results show that, by using a combination of the three techniques, FastCDC is about 10x faster than the best of open-source Rabin-based CDC, and about 3x faster than the state-of-the-art Gear- and AE-based CDC, while achieving nearly the same deduplication ratio as the clas- sic Rabin-based approach.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Wen Xia and Yukun Zhou and Hong Jiang and Dan Feng and Yu Hua and Yuchong Hu and Qing Liu and Yucheng Zhang},
title = {{FastCDC}: A Fast and Efficient {Content-Defined} Chunking Approach for Data Deduplication},
booktitle = {2016 USENIX Annual Technical Conference (USENIX ATC 16)},
year = {2016},
isbn = {978-1-931971-30-0},
address = {Denver, CO},
pages = {101--114},
url = {https://www.usenix.org/conference/atc16/technical-sessions/presentation/xia},
publisher = {USENIX Association},
month = jun
}
connect with us