Abstract

Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known on how to provably avoid overfitting. In this talk I'll describe recent work that provides a new framework to address this problem. I'll then describe several approaches to the problem based on techniques developed in the context of differential privacy.
Based on joint works with Dwork, Hardt, Pitassi, Reingold, Roth and Steinke.