Practical Difficulties in the Application of Differential Privacy for the Release of Large-Scale Public Data Products

Abstract

National statistical agencies are tasked with producing complex, large-scale public data products that are both useful for a variety of purposes and which protect respondent privacy. With large-scale computing power & data set curation outside of national statistical agencies, it has become much more difficult to reasonably characterize the side information that should be considered. Formally private methods promise to fill this gap by providing meaningful privacy guarantees against general classes of attackers. At the United States Census Bureau we have therefore worked to adapt formally private methods for use in the 2020 Decennial Census. However, the scope and complexity of the 2020 Decennial data products and the processes used to create them are in several ways considerably greater than that traditionally addressed in the formal privacy literature. In working to bridge this gap between formally private theory and practice for the 2020 Decennial, we have encountered several practical difficulties, including the following:

error estimation / inference adjustment
low-level, large-scale implementation issues

Presented by Philip Leclerc on behalf of the 2020 Decennial Census Disclosure Avoidance System development team.

Practical Difficulties in the Application of Differential Privacy for the Release of Large-Scale Public Data Products

Abstract

Video Recording