We consider the power of linear reconstruction attacks in statistical data privacy, showing that they can be applied to a much wider range of settings than previously understood. Consider a database curator who manages a database of sensitive information but wants to release statistics about how a sensitive attribute (say, disease) in the database relates to some nonsensitive attributes (e.g., postal code, age, gender, etc). We show one can mount linear reconstruction attacks based on any release that gives: a) the fraction of records that satisfy a given non-degenerate boolean function. Such releases include contingency tables as well as more complex outputs like the error rate of certain classifiers such as decision trees; b) any one of a large class of M-estimators including the standard estimators for linear and logistic regression.
We make two contributions: first, we show how these types of releases can be transformed into a linear format, making them amenable to existing polynomial-time reconstruction algorithms. Second, we show how to analyze the resulting attacks under various distributional assumptions on the data.
Joint work with Mark Rudelson and Adam Smith.