We consider the problem of privately releasing a low dimensional approximation to a set of data records, represented as a matrix $A$ with each row corresponding to a user, and each column to an attribute. Our primary goal is to compute a subspace that captures the covariance of $A$ as much as possible, or classically known as the principal component analysis (PCA). We aim to protect the privacy of each individual: we assume that each row of $A$ has unit $\ell_2$ norm, and the privacy is defined with respect to any single row change. We show that the well-known randomized response algorithm, with properly tuned parameters, can be both private and with nearly optimal additive quality gap, compared to the best possible singular subspace of $A$. We further show that when $A^TA$ has large gap between its eigenvalues (a reason often cited for PCA), the randomized response algorithm may perform much better. For the upperbounds, we utilize results from random matrix theory and matrix perturbation theory. Our lowerbound is inspired by the recent work of Bun, Ullman, and Vadhan on applying Tardos's fingerprinting codes to constructing hard instances for private mechanisms. In order to extend their technique to subspace approximation, we first prove a stronger property of Tardos codes, and then we show how to find a useful vector hidden in a subspace, via a game which we call the list culling game. Both of these might be of independent interests.
We then consider the online setting and show that by combining the randomized response mechanism with the well-known following the perturbed leader (FPL) algorithm, we can obtain a private online algorithm with nearly optimal regret. The regret of our algorithm actually outperforms all the previously known online FPL algorithms even in the non-private setting. We achieve this better bound by, satisfyingly, borrowing insights and tools from differential privacy!
Joint work with Cynthia Dwork, Abhradeep Thakurta and Li Zhang.