Machine learning has fundamentally transformed the way we interact with many networked devices around us and has enabled rich user-centric applications. However, this effectiveness raises profound privacy concerns—how do we control the collection and use of user data gathered? This tension between the collection of users' information to improve user experience (e.g., better predictive keyboards on cell phones), and the corresponding privacy concerns is increasing at an alarming rate with the advent of technologies like the Internet of Things. Differential privacy, a rigorous notion of statistical data privacy, has gained immense popularity in both academia and industry for enabling statistical analysis on sensitive user data while preserving user privacy.
In this talk, I will share my experience in deploying differential privacy technology for iOS 10, catering to more than 300 million customers. I will highlight some of the unique theoretical challenges that arose from the scale of the deployment, and how we had to design theoretically optimal algorithms to mitigate the challenges. Furthermore, I will describe some of the systemic challenges that arose from engineering a large scale distributed system under the constraint of differential privacy. As a concrete problem, I will focus on the problem of learning new words people are typing on their keyboard, under the constraint of differential privacy. For example, from what people type on their keyboards we seek to learn trending words (like the word 't3en'), or internet chat lingos in various languages. We outline a solution to this problem which we deployed at scale in over 10 languages.
The project resulted in three 3 granted US patents, more than 200 news articles, and a follow-up paper under submission.
Disclaimer: The algorithmic and systemic details that will be mentioned in the talk are independent of any specific deployment by any company, but focus on the general scientific principles.
Abhradeep Guha Thakurta, University of California, Santa Cruz
Abhradeep Guha Thakurta is an Assistant Professor in the Department of Computer Science at University of California Santa Cruz. His primary research interest is in the intersection of data privacy and machine learning. He focuses on demonstrating, both theoretically and in practice, that privacy enables designing better machine learning algorithms and vice versa. The notion of privacy he is currently interested in is differential privacy.
Prior to joining the academia, Abhradeep was Senior Machine Learning at Apple, where he was the primary algorithm designer in deploying differential privacy on iOS 10. The project is possibly one of the largest industrial deployment of differential privacy and has resulted in at least 200+ press articles, and two granted patents. Before Apple, Abhradeep worked as a post-doctoral researcher at Microsoft Research Silicon Valley and Stanford University, and then he worked as a Research Scientist at Yahoo Research. He did his Ph.D. from The Pennsylvania State University with Prof. Adam Smith.
author = {Abhradeep Guha Thakurta},
title = {Differential Privacy: From Theory to Deployment},
year = {2017},
address = {Vancouver, BC},
publisher = {USENIX Association},
month = aug
}