The problem of handling adaptivity in data analysis, intentional or not, permeates a variety of fields, including test-set overfitting in ML challenges and the accumulation of invalid scientific discoveries. We propose a mechanism for answering an arbitrarily long sequence of potentially adaptive statistical queries, by charging a price for each query and using the proceeds to collect additional samples. Crucially, we guarantee statistical validity without any assumptions on how the queries are generated. We also ensure with high probability that the cost for $M$ non-adaptive queries is $O(\log M)$, while the cost to a potentially adaptive user who makes $M$ queries that do not depend on any others is $O(\sqrt{M})$

### Tuesday, July 24th, 2018

Let James (jamesyzou [at] gmail.com) know if you’d like to give an impromptu short talk or lead a discussion topic.

### Wednesday, July 25th, 2018

We study the stochastic batched convex optimization problem, in which we use many \emph{parallel} observations to optimize a convex function given limited rounds of interaction. In each of M rounds, an algorithm may query for information at n points, and after issuing all n queries, it receives unbiased noisy function and/or (sub)gradient evaluations at the n points. After M such rounds, the algorithm must output an estimator. We provide lower and upper bounds on the performance of such batched convex optimization algorithms in zeroth and first-order settings for the collections of Lipschitz convex and smooth strongly convex functions. The rates we provide exhibit two interesting phenomena: (1) in terms of the batch size n, the rate of convergence of batched algorithms (nearly) achieves the conventional fully sequential rate once M=O(d log log n), where d is the dimension of the domain, while (2) the rate may exponentially degrade as the dimension d increases, in distinction from fully sequential settings.

We consider the problem of multiple hypothesis testing with generic side information: for each hypothesis we observe both a p-value and some predictor encoding contextual information about the hypothesis. For large-scale problems, adaptively focusing power on the more promising hypotheses (those more likely to yield discoveries) can lead to much more powerful multiple testing procedures. We propose a general iterative framework for this problem, called the Adaptive p-value Thresholding (AdaPT) procedure, which adaptively estimates a Bayes-optimal p-value rejection threshold and controls the false discovery rate (FDR) in finite samples. At each iteration of the procedure, the analyst proposes a rejection threshold and observes partially censored p-values, estimates the false discovery proportion (FDP) below the threshold, and either stops to reject or proposes another threshold, until the estimated FDP is below α. Our procedure is adaptive in an unusually strong sense, permitting the analyst to use any statistical or machine learning method she chooses to estimate the optimal threshold, and to switch between different models at each iteration as information accrues.

This is joint work with Lihua Lei.

Differential privacy provides a rigorous framework for privacy-preserving data analysis. This talk proposes the first differentially private procedure for controlling the false discovery rate (FDR) in multiple hypothesis testing. Inspired by the Benjamini-Hochberg procedure (BHq), our approach is to first repeatedly add noise to the logarithms of the p-values to ensure differential privacy and to select an approximately smallest p-value serving as a promising candidate at each iteration; the selected p-values are further supplied to the BHq and our private procedure releases only the rejected ones. Apart from the privacy considerations, we develop a new technique that is based on a backward submartingale for proving FDR control of a broad class of multiple testing procedures, including our private procedure, and both the BHq step-up and step-down procedures. As a novel aspect, the proof works for arbitrary dependence between the true null and false null test statistics, while FDR control is maintained up to a small multiplicative factor. This theoretical guarantee is the first in the FDR literature to explain the empirical validity of the BHq procedure in three simulation studies.

Let James (jamesyzou [at] gmail.com) know if you’d like to give an impromptu short talk or lead a discussion topic.