# IT Seminar

Vitaly Feldman (IBM Research)

2nd floor interaction area

## Preserving Statistical Validity in Adaptive Data Analysis

Most of existing theory of statistical inference assumes that a fixed hypothesis testing procedure or a learning algorithm is applied to the data, selected non-adaptively before the data is gathered, whereas in practice data is shared and reused with hypotheses and new analyses being generated on the basis of data exploration and the outcomes of previous analyses. In this work we initiate a principled study of how to guarantee the validity of statistical inference in adaptive data analysis. As an instance of this problem, we propose and investigate the question of estimating the expectations of $m$ adaptively chosen functions on an unknown distribution given $n$ random samples.

We show that, surprisingly, there is a way to estimate an exponential in $n$ number of expectations accurately even if the functions are chosen adaptively. This gives an exponential improvement over standard empirical estimators that are limited to a linear number of estimates. Our result follows from a general technique that counter-intuitively involves actively perturbing and coordinating the estimates, using techniques developed for privacy preservation. We give additional applications of this technique to our question.

Based on a joint work with Cynthia Dwork, Moritz Hardt, Omer Reingold, Aaron Roth and Toni Pitassi Coincidentally, Moritz is giving a talk on these results and some follow up work on Apr 2 but my talk will be mostly complementary.