Transforming Data through Deidentification

Monday, June 03, 2024 - 3:15 pm3:35 pm

Akshatha Gangadharaiah, Bijeeta Pal, and Sameera Ghayyur, Snap Inc.

Abstract: 

Text data shared by users is important for improving the user experience and quality of products and introducing new features. To clarify, text data does not relate to private communications data but rather refers to data that can come from various features e.g., search queries, text captions from public content. Using text data for downstream tasks like analysis, improving user safety, training models can pose significant privacy risks as the data may contain sensitive and private information about individuals. To address these challenges, we introduced a novel text deidentification workflow, designed to improve privacy while maximizing the utility for downstream tasks. The deidentification workflow works as follows: firstly, a PII redaction process systematically eliminates user-identifying attributes; secondly, an LLM rewrite modifies sentences to remove user-specific writing styles and lastly, a validation process gauges the efficacy of text deidentification. In this talk, we will go over the details of the various phases of the text deidentification workflow along with an overview of its implementation using Temporal.

Akshatha Gangadharaiah, Snap Inc.

Akshatha Gangadharaiah is the lead of Data Governance and MyAI Privacy at Snap. She has been working on privacy and governance solutions at Snap for the last 5 years. She holds a Master's degree in Computer Science from University of California, San Diego.

Bijeeta Pal, Snap Inc.

Bijeeta Pal is a privacy engineer at Snap. Before joining Snap, she completed her PhD in computer science at Cornell University.

Sameera Ghayyur, Snap Inc.

Sameera Ghayyur is currently a privacy engineer at Snap Inc where she is the primary privacy reviewer on My AI chatbot product among many other features in Snapchat. In the past, she has also worked in the privacy teams at Meta and Honeywell. She received her PhD in computer science from University of California, Irvine and her research is focused on accuracy aware privacy preserving algorithms. She also has experience working as a software engineer and a lecturer.

BibTeX
@conference {296301,
author = {Akshatha Gangadharaiah and Bijeeta Pal and Sameera Ghayyur},
title = {Transforming Data through Deidentification},
year = {2024},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jun
}