- FAST '13 Home
- Organizers
- Registration Information
- Registration Discounts
- At a Glance
- Calendar
- Training Program
- Technical Sessions
- Purchase the Box Set
- Posters and WiPs
- Birds-of-a-Feather Sessions
- Sponsors
- Activities
- Hotel and Travel Information
- Services
- Students
- Questions
- Help Promote
- For Participants
- Call for Papers
- Past Proceedings
sponsors
usenix conference policies
Data DeDuplication: Technologies, Trends, and Challenges
Crystal Room
The tutorial will serve to introduce the state of the art in data deduplication systems for storage. We will make the presentation of most of the material self-contained. We expect attendees to have some background in the basic concepts of storage systems.
The storage market is witnessing unprecedented growth, with enterprise storage growing 50–60% per year and cloud storage growing even faster. Data deduplication is the #1 feature for which customers ask when they invest in storage solutions. Data deduplication detects and eliminates redundancies in data, with the benefits applying to both storage capacity savings ("data at rest") and network bandwidth savings ("data on wire"). In addition to taming the growth in storage total-cost-of-ownership, the storage capacity savings can help to make high IOPS devices like flash-based SSDs more feasible in terms of cost. The network bandwidth savings can help to mitigate WAN bottlenecks, thus enabling user-to-cloud and hybrid private-public cloud storage scenarios.
Backup data deduplication has been around for about a decade, championed by early startups in the space such as Data Domain. Recent developments bring data deduplication to the more expensive and faster primary storage tier, where deduplication space savings is more valuable, translating to reductions in the amount of data that needs to be replicated, geo-replicated, cached, backed up, and transferred over the network.
In this tutorial, we will survey technologies in the data deduplication area at both the algorithmic and systems levels. We will follow the progression of ideas over time and identify current trends in research and industry. We will outline the challenges that need to be addressed going forward. Topics covered will include research aspects of the entire data deduplication pipeline—data chunking, data indexing, primary data access, storage maintenance operations—as well as case studies of commercially deployed systems.
Graduate students and researchers working in the areas of storage, enterprise computing, cloud computing, and enterprise/Web services; practicing storage professionals in the technology industry, especially in enterprise and cloud data center space.
connect with us