Talks
Fall 2013

# Randomized Response for Private Principal Component Analysis

Saturday, Dec. 14, 2013 2:00 pm2:30 pm PST

We consider the problem of privately releasing a low dimensional approximation to a set of data records, represented as a matrix $A$ with each row corresponding to a user, and each column to an attribute. Our primary goal is to compute a subspace that captures the covariance of $A$ as much as possible, or classically known as the principal component analysis (PCA). We aim to protect the privacy of each individual: we assume that each row of $A$ has unit $\ell_2$ norm, and the privacy is defined with respect to any single row change. We show that the well-known randomized response algorithm, with properly tuned parameters, can be both private and with nearly optimal additive quality gap, compared to the best possible singular subspace of $A$. We further show that when $A^TA$ has large gap between its eigenvalues (a reason often cited for PCA), the randomized response algorithm may perform much better. For the upperbounds, we utilize results from random matrix theory and matrix perturbation theory. Our lowerbound is inspired by the recent work of Bun, Ullman, and Vadhan on applying Tardos's fingerprinting codes to constructing hard instances for private mechanisms. In order to extend their technique to subspace approximation, we first prove a stronger property of Tardos codes, and then we show how to find a useful vector hidden in a subspace, via a game which we call the list culling game. Both of these might be of independent interests.