The Future of the Past: Challenges in Archival Storage

Ethan L. Miller, University of California, Santa Cruz, and Pure Storage

Abstract: 

Our civilization is built on passing information to future generations, a process that has been evolving for tens of thousands of years. As data has become digital and the volume has exploded, however, techniques for long-term archival storage of information have not kept pace, a problem that may threaten our society's ability to continue to build on existing knowledge. This talk will describe the challenges of preserving digital data for tens to thousands of years, both physical (data media) and semantic (understanding old data). It will discuss recent advances in archival storage technologies such as DNA and glass-based storage, and their implications for data preservation. It will also touch on challenges to preserving and protecting information, not just bits, including security and integrity of long-term stored data and the ability to interpret data in fifty years. The hope is that this talk will inspire the computer systems community to consider ways to attack these problems before it's too late to preserve the information to which we have access today.

Ethan L. Miller, University of California, Santa Cruz, and Pure Storage

Ethan L. Miller is a Professor in the Computer Science and Engineering Department at the University of California, Santa Cruz, where he holds the Veritas Presidential Chair in Storage. He was the Director of the NSF IUCRC Center for Research in Storage Systems (CRSS) from 2013-2020, and was a founding member of the Storage Systems Research Center (SSRC) at UC Santa Cruz. He is a Fellow of the IEEE and an ACM Distinguished Scientist, and his publications have received multiple Best Paper awards. Prof. Miller received an Sc.B. from Brown University in 1987 and a Ph.D. from UC Berkeley in 1995, and has been on the UC Santa Cruz faculty since 2000. He has co-authored over 160 papers in a range of topics in file and storage systems, operating systems, parallel and distributed systems, information retrieval, and computer security. He was a member of the team that developed Ceph, a scalable high-performance distributed file system for scientific computing that is now being adopted by several high-end computing organizations. His work on reliability and security for distributed storage is also widely recognized, as is his work on secure, efficient long-term archival storage and scalable metadata systems.

His current research projects, which are funded by the National Science Foundation and industry support for the CRSS and SSRC, include system support for byte-addressable non-volatile memory (Twizzler), archival storage systems, and reliable and secure storage systems. Prof. Miller has worked closely with industry to help move research results into commercial use at companies such as NetApp and Veritas, and has been working with Pure Storage since 2009 to develop reliable high-performance flash-based storage systems. Additional information is available at https://www.crss.ucsc.edu/person/elm.html.

BibTeX
@conference {254507,
author = {Ethan L. Miller},
title = {The Future of the Past: Challenges in Archival Storage},
year = {2020},
publisher = {USENIX Association},
month = jul
}

Presentation Video