Abstract

Abstract: This talk will present Tumult Labs’ soon-to-be-open-source platform for SQL-like analytics with configurable differential privacy guarantees. It is currently used by a variety of organizations -- including the US Census Bureau, US Internal Revenue Service, and Wikimedia -- to publicly share aggregate statistics about populations of interest. We outline key design requirements, with a particular emphasis on usability -- to onboard data scientists new to differential privacy -- and scalability and expressivity -- to support production use cases that produce 100s of millions of statistics with tight privacy accounting. We explain how these requirements motivated a multi-layer architecture consisting of a user-friendly dataframe-like interface on top of an extensible privacy framework on top of Apache Spark. We report on lessons learned and identify some open challenges related to resolving tensions between pairs of design requirements, specifically between extensibility and scalability and between usability and explainability / transparency.

Bio: Michael Hay is the Founder/CTO of Tumult Labs, a startup that helps organizations safely release data using differential privacy, and an Associate Professor of Computer Science at Colgate University. He was previously a Research Data Scientist at the US Census Bureau and a Computing Innovation Fellow at Cornell University. He holds a Ph.D. from the University of Massachusetts Amherst and a bachelor's degree from Dartmouth College.