Deleting Data at Organizational Scale

Diogo Lucas

Monday, June 03, 2024 - 5:15 pm–5:35 pm

Diogo Lucas, Stripe

Abstract:

Deleting a million records from a dataset can be hard. Deleting one record from a million datasets can be even harder.

Data has a tendency to sprawl. In today's information-hungry world, information is replicated and permutated in a myriad of ways in data marts, lakes, and warehouses. This proliferation can add massive volume and variety, turning a single input point into many thousands of somewhat related downstream entries.

So when it comes to observing a person's right to be forgotten, how can we find their information's needle in a company's data-hungry haystack? How can we do that in a world of architectural sprawl and data repurposing? And how do we do all that without breaking legitimate data usage cases?

In this session, we will evaluate the fundamental building blocks and practices that allow Stripe to guarantee our customer's (direct and indirect) rights for data deletion. Those include detection and attribution of sensitive data and its affiliation, impact analysis through exploration, and the combined use of deletion propagation and orchestration.

Diogo Lucas, Stripe

Diogo Lucas is an engineering lead in Stripe's privacy infrastructure team. He is deeply involved in privacy-related initiatives such as data deletion and sensitive data access controls. He has more than 15 years of industry experience, many of those dedicated to automating privacy and overall governance controls.

BibTeX

Deleting Data at Organizational Scale

Website Maintenance Alert

Diogo Lucas, Stripe

Presentation Video