Understanding De-identification Guidance and Practices for Research Data

Authors: 

Wentao Guo and Aditya Kishore, University of Maryland; Paige Pepitone, NORC at the University of Chicago; Adam Aviv, The George Washington University; Michelle Mazurek, University of Maryland

Abstract: 

Publishing de-identified research data is beneficial for transparency and the advancement of knowledge, but it creates the risk that research subjects could be re-identified, exposing private information. De-identifying data is difficult, with evolving techniques and mixed incentives. We conducted a thematic analysis of 38 recent online de-identification guides, characterizing the content of these guides and identifying concerning patterns, including inconsistent definitions of key terms, gaps in coverage of threats, and areas for improvement in usability. We also interviewed 26 researchers with experience de-identifying and reviewing data for publication, analyzing how and why most of these researchers may fall short of protecting against state-of-the-art re-identification attacks.

Open Access Media

USENIX is committed to Open Access to the research presented at our events. Papers and proceedings are freely available to everyone once the event begins. Any video, audio, and/or slides that are posted after the event are also free and open to everyone. Support USENIX and our commitment to Open Access.