Finesse: Fine-Grained Feature Locality based Fast Resemblance Detection for Post-Deduplication Delta Compression

Authors: 

Yucheng Zhang, Hubei University of Technology; Wen Xia, Harbin Institute of Technology, Shenzhen & Peng Cheng Laboratory; Dan Feng, WNLO, School of Computer, Huazhong University of Science and Technology; Hong Jiang, University of Texas at Arlington; Yu Hua and Qiang Wang, WNLO, School of Computer, Huazhong University of Science and Technology

Abstract: 

In storage systems, delta compression is often used as a complementary data reduction technique for data deduplication because it is able to eliminate redundancy among the non-duplicate but highly similar chunks. Currently, what we call 'N-transform Super-Feature' (N-transform SF) is the most popular and widely used approach to computing data similarity for detecting delta compression candidates. But our observations suggest that the N-transform SF is compute-intensive: it needs to linearly transform each Rabin fingerprint of the data chunks N times to obtain N features, and can be simplified by exploiting the fine-grained feature locality existing among highly similar chunks to eliminate time-consuming linear transformations. Therefore, we propose Finesse, a fine-grained feature-locality-based fast resemblance detection approach that divides each chunk into several fixed-sized subchunks, computes features from these subchunks individually, and then groups the features into super-features. Experimental results show that, compared with the state-of-the-art N-transform SF approach, Finesse accelerates the similarity computation for resemblance detection by 3.2× ~ 3.5× and increases the final throughput of a deduplicated and delta compressed prototype system by 41% ~ 85%, while achieving comparable compression ratios

FAST '19 Open Access Sponsored by NetApp

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.

BibTeX
@inproceedings {227822,
author = {Yucheng Zhang and Wen Xia and Dan Feng and Hong Jiang and Yu Hua and Qiang Wang},
title = {Finesse: {Fine-Grained} Feature Locality based Fast Resemblance Detection for {Post-Deduplication} Delta Compression},
booktitle = {17th USENIX Conference on File and Storage Technologies (FAST 19)},
year = {2019},
isbn = {978-1-939133-09-0},
address = {Boston, MA},
pages = {121--128},
url = {https://www.usenix.org/conference/fast19/presentation/zhang},
publisher = {USENIX Association},
month = feb
}

Presentation Video