sponsors
help promote
usenix conference policies
Metadata Considered Harmful…to Deduplication
Xing Lin, University of Utah; Fred Douglis and Jim Li, EMC Corporation; Xudong Li, Nankai University; Robert Ricci, University of Utah; Stephen Smaldone and Grant Wallace, EMC Corporation
Deduplication is widely used to improve space efficiency in storage systems. While much attention has been paid to making the process of deduplication fast and scalable, the effectiveness of deduplication can vary dramatically depending on the data stored. We show that many file formats suffer from a fundamental design property that is incompatible with deduplication: they intersperse metadata with data in ways that result in otherwise identical data being different. We examine three models for improving deduplication in the presence of embedded metadata: deduplicationfriendly data formats, application-level post-processing, and format-aware deduplication. Working with realworld file formats and datasets, we find that by separating metadata from data, deduplication ratios are improved significantly—in some cases as dramatically as 5.6.
Open Access Media
USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.
author = {Xing Lin and Fred Douglis and Jim Li and Xudong Li and Robert Ricci and Stephen Smaldone and Grant Wallace},
title = {Metadata Considered {Harmful{\textellipsis}to} Deduplication},
booktitle = {7th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 15)},
year = {2015},
address = {Santa Clara, CA},
url = {https://www.usenix.org/conference/hotstorage15/workshop-program/presentation/lin},
publisher = {USENIX Association},
month = jul
}
connect with us