Data DeDuplication: Technologies, Trends, and Challenges

Description:

The tutorial will serve to introduce the state of the art in data deduplication systems for storage. We will make the presentation of most of the material self-contained. We expect attendees to have some background in the basic concepts of storage systems.

The storage market is witnessing unprecedented growth, with enterprise storage growing 50–60% per year and cloud storage growing even faster. Data deduplication is the #1 feature for which customers ask when they invest in storage solutions. Data deduplication detects and eliminates redundancies in data, with the benefits applying to both storage capacity savings ("data at rest") and network bandwidth savings ("data on wire"). In addition to taming the growth in storage total-cost-of-ownership, the storage capacity savings can help to make high IOPS devices like flash-based SSDs more feasible in terms of cost. The network bandwidth savings can help to mitigate WAN bottlenecks, thus enabling user-to-cloud and hybrid private-public cloud storage scenarios.

Backup data deduplication has been around for about a decade, championed by early startups in the space such as Data Domain. Recent developments bring data deduplication to the more expensive and faster primary storage tier, where deduplication space savings is more valuable, translating to reductions in the amount of data that needs to be replicated, geo-replicated, cached, backed up, and transferred over the network.

In this tutorial, we will survey technologies in the data deduplication area at both the algorithmic and systems levels. We will follow the progression of ideas over time and identify current trends in research and industry. We will outline the challenges that need to be addressed going forward. Topics covered will include research aspects of the entire data deduplication pipeline—data chunking, data indexing, primary data access, storage maintenance operations—as well as case studies of commercially deployed systems.

twitter

usenix conference policies

Data DeDuplication: Technologies, Trends, and Challenges

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners

sponsors

twitter

usenix conference policies

You are here

connect with us

Data DeDuplication: Technologies, Trends, and Challenges

Platinum Sponsors

Gold Sponsors

Silver Sponsors

Bronze Sponsors

General Sponsors

Media Sponsors & Industry Partners