Damien Desfontaines, Tumult Labs
Synthetic data generation makes for a convincing pitch: create fake data that follows the same statistical distribution as your real data, so you can analyze it, share it, sell it… while claiming that this is all privacy-safe and compliant, because synthetic data is also "fully anonymous".
How do synthetic data vendors justify such privacy claims? The answer often boils down to "empirical privacy metrics": after generating synthetic data, run measurements on this data and empirically determine whether it's safe enough to release. But how do these metrics work? How useful are they? How much should you rely on them?
This talk will take a critical look at the space of synthetic data generation and empirical privacy metrics, dispel some some marketing-fueled myths that are a little too good to be true, and explain what is needed for these tools to be a valuable part of a larger privacy posture.
Damien Desfontaines, Tumult Labs
Damien works at Tumult Labs, a startup that helps organizations share or publish insights from sensitive data using differential privacy. He likes deploying robust anonymization solutions that solve real-world problems while deeply respecting the privacy of the people in the data. He tends to get kind of annoyed when people adopt an approach to anonymization mostly based on ~vibes~, even though the principled solution is, like, right there.
author = {Damien Desfontaines},
title = {Empirical Privacy Metrics: The Bad, the {Ugly{\textellipsis}} and the Good, Maybe?},
year = {2024},
address = {Santa Clara, CA},
publisher = {USENIX Association},
month = jun
}