We present a general method for privacy-preserving Bayesian inference in Poisson factorization, a broad class of models that includes some of the most widely used models in the social sciences. Our method satisfies limited precision local privacy, a generalization of local differential privacy, which we introduce to formulate privacy guarantees appropriate for sparse count data. We develop an MCMC algorithm that approximates the locally private posterior over model parameters given data that has been locally privatized by the geometric mechanism (Ghosh et al., 2012). Our solution is based on two insights: 1) a novel reinterpretation of the geometric mechanism in terms of the Skellam distribution (Skellam, 1946) and 2) a general theorem that relates the Skellam to the Bessel distribution (Yuan & Kalbfleisch, 2000). We demonstrate our method in two case studies on real-world email data in which we show that our method consistently outperforms the commonly-used naive approach, obtaining higher quality topics in text and more accurate link prediction in networks. On some tasks, our privacy-preserving method even outperforms non-private inference which conditions on the true data.

### Monday, April 8th, 2019

We study the problem of estimating finite sample confidence intervals of the mean of a normal population under the constraint of differential privacy. We consider both the known and unknown variance cases and construct differentially private algorithms to estimate confidence intervals. Crucially, our algorithms guarantee a finite sample coverage, as opposed to an asymptotic coverage. Unlike most previous differentially private algorithms, we do not require the domain of the samples to be bounded. We also prove lower bounds on the expected size of any differentially private confidence set showing that our the parameters are optimal up to polylogarithmic factors.

In this presentation I will discuss recent work using additive perturbations from elliptical distributions to achieve differential privacy. Examples of elliptical distributions include both the Laplace and Gaussian, but many other options are available that achieve different tail behaviors. I will also provide a seemingly odd result showing that while elliptical distributions can achieve epsilon privacy in finite dimensions, it is impossible in infinite dimensions; elliptical distributions in infinite dimensions can only achieve (epsilon, delta) privacy.

The potential advantages of using robust statistics for privacy have already been discussed in the existing literature. However there has been little investigation on how robustness can preserve differential privacy as well as deliver statistical accuracy from the private outputs. In this talk we provide an overview of some basic elements of robust statistics that can be of relevance to privacy and additionally discuss recently proposed bias-correction (and inference) methods that can be used for this purpose. The idea we want to investigate is whether existing privacy mechanisms can be improved in terms of (statistical) utility by making use of these approaches and we study their behaviour in a simple simulation setting. The results indicate that these approaches (robustness and bias-correction) can be worth investigating further and employed more abundantly within privacy-preserving mechanisms.

We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests can be written in terms of linear constraints, and for exchangeable data can always be expressed as a function of the empirical distribution. Using this structure, we prove a ‘Neyman-Pearson lemma’ for binomial data under DP, where the DP-UMP only depends on the sample sum. Our tests can also be stated as a post-processing of a random variable, whose distribution we coin “Truncated-Uniform-Laplace” (Tulap), a generalization of the Staircase and discrete Laplace distributions. Furthermore, we obtain exact p-values, which are easily computed in terms of the Tulap random variable.

Using the above techniques, we show that our tests can be applied to give stochastically smallest one-sided confidence intervals and optimal confidence distributions. We also derive uniformly most powerful unbiased (UMPU) two-sided tests, which lead to two-sided confidence intervals. We show that our results can be applied to distribution-free hypothesis tests for continuous data. Our simulation results demonstrate that all our tests have exact type I error, and are more powerful than current techniques.

Hypothesis testing plays a central role in statistical inference, and is used in many settings where privacy concerns are paramount. In this talk we’ll address a basic question about privately testing simple hypotheses: given two distributions P and Q, and a privacy level ε, how many i.i.d. samples are needed to distinguish P from Q subject to ε-differential privacy, and what sort of tests have optimal sample complexity? Specifically, we'll characterize this sample complexity up to constant factors in terms of the structure of P and Q and the privacy level ε, and show that this sample complexity is achieved by a certain randomized and clamped variant of the log-likelihood ratio test. This result is an analogue of the classical Neyman–Pearson lemma in the setting of private hypothesis testing. The characterization applies more generally to hypothesis tests satisfying essentially any notion of algorithmic stability, which is known to imply strong generalization bounds in adaptive data analysis, and thus our results have applications even when privacy is not a primary concern.

Joint work with Clément Canonne, Gautam Kamath, Adam Smith and Jonathan Ullman.

### Tuesday, April 9th, 2019

National statistical agencies are tasked with producing complex, large-scale public data products that are both useful for a variety of purposes and which protect respondent privacy. With large-scale computing power & data set curation outside of national statistical agencies, it has become much more difficult to reasonably characterize the side information that should be considered. Formally private methods promise to fill this gap by providing meaningful privacy guarantees against general classes of attackers. At the United States Census Bureau we have therefore worked to adapt formally private methods for use in the 2020 Decennial Census. However, the scope and complexity of the 2020 Decennial data products and the processes used to create them are in several ways considerably greater than that traditionally addressed in the formal privacy literature. In working to bridge this gap between formally private theory and practice for the 2020 Decennial, we have encountered several practical difficulties, including the following:

- error estimation / inference adjustment
- low-level, large-scale implementation issues

Presented by Philip Leclerc on behalf of the 2020 Decennial Census Disclosure Avoidance System development team.

A private data federation is a set of autonomous databases that share a unified query interface offering in-situ evaluation of SQL queries over the union of the sensitive data of its members. Owing to privacy concerns, these systems do not have a trusted data collector that can see all their data and their member databases cannot learn about individual records of other engines. Federations currently achieve this goal by evaluating queries obliviously using secure multi-party computation. This hides the intermediate result cardinality of each query operator by exhaustively padding it. With cascades of such operators, this padding accumulates to a blow-up in the output size of each operator and a proportional loss in query performance. Hence, existing private data federations do not scale well to complex SQL queries over large datasets.

We introduce Shrinkwrap, a private data federation that offers data owners a differentially private view of the data held by others to improve their performance over oblivious query processing. Shrinkwrap uses computational differential privacy to minimize the padding of intermediate query results, achieving up to a 35X performance improvement over oblivious query processing. When the query needs differentially private output, Shrinkwrap provides a trade-off between result accuracy and query evaluation performance.

Within the U.S. Federal Statistical System there is a common notion to transform statistical agencies from relying primarily on survey data to create statistics by combining survey and administrative data. This notion is shared by program agencies and based on recommendation provided by the Commission on Evidence-Based Policymaking, new legislation is now in place that promotes the access of administrative data and the sharing of such data across program agencies. In all policy documents proper protection of privacy is mentioned as a desired goal. This presentation will introduce a set of typical administrative data, describe its structure, size, types of variables, describe applications/use cases for the data, showcase typical amounts of data cleaning and feature generation, and discuss the challenges for analysts should data be accessible only in a DP context.

Many data producers seek to provide users access to confidential data without unduly compromising data subjects' privacy and confidentiality. One general strategy requires users to perform analyses without seeing the confidential data; for example, analysts only get access to synthetic data or query systems that provide disclosure-protected outputs of statistical models. With synthetic data or redacted outputs, the analyst never really knows how much to trust the resulting findings. If users perform the same analysis using the confidential and synthetic data, would regression coefficients of interest be statistically significant in both analyses? Do regression coefficients from both analyses fall within given intervals? We present algorithms for addressing these questions while satisfying differential privacy. We describe conditions under which some of the algorithms provide adequate answers. We illustrate the properties of the proposed methods using artificial and genuine data.

In this talk, I demonstrate fundamental lower bounds on locally private learning and estimation in all known models of differential privacy--differential, Renyi, approximate--for all values of privacy parameter and with all mechanisms of interaction. I will also discuss algorithms achieving these upper bounds, showing the first practical (and minimax optimal) methods for solving large-scale statistical learning and risk-minimization problems, with both theoretical and empirical evaluation.

Our panel starts with the following related questions:

- When we talk about the “science of data analysis,” what different things might we mean?
- In statistics and computer science there is a often a gap between theoretical models and applied methodology. In the context of the “science of data analysis” how might we bridge this gap?

We have two concrete examples in which to think about these questions.

- How should the current debate over the value of reporting statistical significance influence the development of methods for analyzing data privately? Reference: American Statistician, Vol. 73, Issue 1. https://www.tandfonline.
com/toc/utas20/73/sup1?nav= tocList - How do we reconcile different views/approaches to the problem of adaptive data analysis suggested in different literatures with statistical practice? References: http://www.stat.
columbia.edu/~gelman/research/ andunpublished/p_hacking.pdf http://science.sciencemag.org/ content/349/6248/636.full

If you have other ideas/examples/questions that you would like to see addressed by the panel, please email Aleksandra Slavkovic (sesa at psu.edu).

### Wednesday, April 10th, 2019

The change-point detection problem seeks to identify distributional changes at an unknown change-point k* in a stream of data. This problem appears in many important practical settings involving personal data, including biosurveillance, fault detection, finance, signal detection, and security systems. The field of differential privacy offers data analysis tools that provide powerful worst-case privacy guarantees. We study the statistical problem of change-point detection through the lens of differential privacy. We give private algorithms for both online and offline change-point detection, analyze these algorithms theoretically, and provide empirical validation of our results.

Joint work with Sara Krehbiel, Yajun Mei, Rui Tuo, and Wanrong Zhang.

The Stochastic Multi-Armed Bandit is one of the canonical learning problems, studied extensively since the 1950s. The problem is modeled as taking an action in each timestep and observing a feedback / reward, and its goal is to play a sequence of action whose overall reward is comparable to the best action in hindsight. In this talk we will survey a collection of results regarding private learning in the stochastic MAB setting. We will introduce the canonical UCB algorithm and its differentially private version, discuss extensions to linear and contextual bandits, introduce a lower bound for any \epsilon-differentially private algorithm for this problem and finally introduce a different privacy preserving algorithm that meets this lower-bound.

This is joint work with Roshan Shariff and Touqir Sajed.

Popular approaches to differential privacy, such as the Laplace and exponential mechanisms, calibrate randomised smoothing through global sensitivity of the target non-private function. Bounding such sensitivity can be prohibitively complex for end users. As an alternative, we propose a straightforward sampler for estimating sensitivity of non-private mechanisms. Since our sensitivity estimates hold with high probability, any mechanism that would be (ϵ,δ)-differentially private under bounded global sensitivity automatically achieves (ϵ,δ,γ)-random differential privacy (Hall, Rinaldo, Wasserman, 2012), without any target-specific calculations required. We demonstrate on worked example learners how our usable approach adopts a naturally-relaxed privacy guarantee, while achieving more accurate releases even for non-private functions that are black-box computer programs.

This work studies differential privacy in the context of the recently proposed shuffle model. Unlike in the local model, where the server collecting privatized data from users can track back an input to a specific user, in the shuffle model users submit their privatized inputs to a server anonymously. This setup yields a trust model which sits in between the classical curator and local models for differential privacy. The shuffle model is the core idea in the Encode, Shuffle, Analyze (ESA) model introduced by Bittau et al. (SOPS 2017). Recent work by Cheu et al. (Forthcoming, EUROCRYPT 2019) analyzes the differential privacy properties of the shuffle model and shows that in some cases shuffled protocols provide strictly better accuracy than local protocols. Additionally, Erlignsson et al. (SODA 2019) provide a privacy amplification bound quantifying the level of curator differential privacy achieved by the shuffle model in terms of the local differential privacy of the randomizer used by each user.

In this context, we make three contributions. First, we provide an optimal single message protocol for summation of real numbers in the shuffle model. Our protocol is very simple and has better accuracy and communication than the protocols for this same problem proposed by Cheu et al. Optimality of this protocol follows from our second contribution, a new lower bound for the accuracy of private protocols for summation of real numbers in the shuffle model. The third contribution is a new amplification bound for analyzing the privacy of protocols in the shuffle model in terms of the privacy provided by the corresponding local randomizer. Our amplification bound extends the results by Erlingsson et al. to a wider range of parameters in the non-adaptive setting, and provides a whole family of methods to analyze privacy amplification in the shuffle model.

Joint work with J. Bell, A. Gascón, and K. Nissim.

### Thursday, April 11th, 2019

We will discuss two fundamental inference tasks, distribution learning, and testing under Local Differential Privacy (LDP).

We start by considering density estimation (distribution learning). We will propose two private-coin LDP schemes, which require log k and one bit of communication per player respectively, while still being sample-optimal. This is contrast with all known schemes (e.g., [EPK14, DJW13, KBW16, YB18]) that require \Omega(k) communication. It is also known that the availability of public-coins does not change the sample complexity of learning.

We then consider distribution testing (goodness-of-fit) problem, where the goal is to test whether a distribution is equal to a known reference distribution, or far from it. Unlike distribution learning, we show that the sample complexity reduces dramatically for public-coin schemes, compared to private-coin protocols.

To prove the lower bounds, we reduce the problem to bounding a new quantity we define, the chi^2 contraction. For public-coin schemes, we bound the min-max chi^2 contraction, and for private-coin schemes, we bound max-min chi^2 contraction. Here the min is over appropriately chosen priors, and and max is over the LDP schemes.

Joint works with Clement Canonne, Cody Freitag, Ziteng Sun, Himanshu Tyagi, and Huanyu Zhang.

Sampling is a powerful tool that is widely used statistics and machine Learning for nonparametric / distribution-free inference. Interestingly, it also finds its place as the centerpiece of many useful algorithms and fundamental theory in differential privacy under the name of “privacy amplification” or “subsampling lemma”, which roughly says that if we apply an (ε, δ)-DP mechanism to a **randomly** chosen subsample of the data, then a stronger “amplified" privacy guarantee of (O(γε), γδ)-DP can be proven.

In this talk, I will present a new privacy amplification result under the formalism of Renyi Differential Privacy. The result allows us to perform tighter privacy composition over a sequence of heterogeneous subsampled mechanisms, and also to understand the nature of subsampled mechanisms on a more fine-grained scale.

I will talk about the mathematics underlying this problem as well as the numerical problems that arise when implementing the bound.

If time permits, I will also talk about a recent unpublished work that addresses the same problem (but more precisely) for the case of Poisson Sampling — when each data point is selected independently at random.

Differentially Private algorithms often need to select the best amongst many candidate options. Classical works on this selection problem, such as the Exponential Mechanism, require that the candidates’ goodness, measured as a real-valued score function, does not change by much when one person’s data changes. In many applications such as hyperparameter optimization, this stability assumption is much too strong. I will talk about recent work, where we consider the selection problem under a much weaker stability assumption on the candidates, namely that the score functions are differentially private. Under this assumption, I will describe algorithms that are near-optimal along the three relevant dimensions: privacy, utility and computational efficiency.

Joint work with Jingcheng Liu.

We consider learning in large-scale systems under the constraint of local differential privacy. For many learning problems known efficient algorithms in this model require many rounds of communication between the server and the clients holding the data points. Yet multi-round protocols are prohibitively slow in practice due to network latency and, as a result, currently deployed large-scale systems are limited to a single round. Very little is known about which learning problems can be solved by such non-interactive systems. The only lower bound we are aware of is for PAC learning an artificial class of functions with respect to a uniform distribution (Kasiviswanathan et al., 2008).

We show that the margin complexity of a class of function is a lower bound on the complexity of any non-interactive algorithm for distribution-independent learning of the class. In particular, the classes of linear separators and decision lists require exponential number of samples to learn non-interactively even though they can be learned in polynomial time by an interactive algorithm. Our lower bound also holds against a stronger class of algorithms for which only queries that depend on labels are non-interactive (we refer to them as label-non-adaptive). We complement this lower bound with a new efficient and label-non-adaptive learning algorithm whose complexity is polynomial in the margin complexity.

Joint work with Amit Daniely.

The class of halfspaces forms an important primitive in machine learning as learning halfspaces implies learning many other concept classes. We study the question of privately learning halfspaces and present a private learner for halfspaces over an arbitrary finite domain X subset R^d with sample complexity poly(d,2^{\log^*|X|}). The building block for this learner is a differentially private algorithm for locating an approximate center point of points -- a high dimensional generalization of the median function. Our construction establishes a relationship between these two problems that is reminiscent of the relation between the median and learning one-dimensional thresholds [Bun et al. FOCS '15]. This relationship suggests that the problem of privately locating a center point may have further applications in the design of differentially private algorithms. We also provide a lower bound on the sample complexity for privately finding a point in the convex hull.

The discovery of heavy hitters (most frequent items) in user-generated data streams drives improvements in the app and web ecosystems, but often comes with substantial privacy risks. To address these risks, we propose a distributed and privacy-preserving algorithm for discovering the heavy hitters in a population of user-generated data streams. We critically leverage the sampling property of our distributed algorithm to prove that it is inherently differentially private, without requiring any addition of noise. We also examine the trade-off between privacy and utility, and show that our algorithm provides excellent utility while also achieving strong privacy guarantees. We validate our findings both theoretically, using worst-case analyses, and practically, using a Twitter dataset with 1.6M tweets and over 650k users.

In this talk, I will present new differentially private learning algorithms that are given access to a limited amount of public unlabeled data. These algorithms use any non-private learning algorithm as a black box. We prove formal privacy and accuracy guarantees for such algorithms. The accuracy guarantees depend only on the accuracy of the underlying non-private learning algorithm. As a consequence, we prove upper bounds on the private sample complexity in the Probably Approximately Correct (PAC) learning model. These bounds are completely characterized by the VC dimension of the concept class.

### Friday, April 12th, 2019

Since its inception as a field of study roughly one decade ago, research in algorithmic fairness has exploded. Much of this work focuses on so-called "group fairness" notions, which address the relative treatment of different demographic groups. More theoretical work advocated for "individual fairness" which, speaking intuitively, requires that people who are similar, with respect to a given classification task, should be treated similarly by classifiers for that task. Both approaches face significant challenges: for example, provable incompatibility of natural fairness desiderata (for the group case), and the absence of similarity information (for the individual case).

In the last two years the theory literature has begun to explore notions that try to bridge the gap between group and individual fairness. These works raise compelling questions with ties to well-studied topics in statistics, such as forecasting and combining the opinions of experts, as well as psychology, law, economics, and political science. We will discuss some of these questions from the starting point of multi-calibration (Hebert-Johnson, Kim, Reingold, and Rothblum, 2018) and very recent related work on evidence-based rankings (Dwork, Kim, Reingold, Rothblum, and Yona).

A profound discovery in statistical learning, which is sometimes referred to as “The Fundamental Theorem of PAC Learning”, asserts that the following properties are equivalent for an hypothesis class H:

(i) H is PAC learnable,

(ii) H satisfies the Uniform Convergence property,

(iii) H has a finite VC dimension.

This result provides formal justifications to principles such as Empirical Risk Minimization. (which amounts to the equivalence between items (i) and (ii).) It also demonstrates basic concepts such as Overfitting.

We will present and discuss an analogues conjecture for private learning:

Conjecture. The following properties are equivalent for an hypothesis class H:

(i) H is privately PAC learnable,

(ii) H satisfies the Private Uniform Convergence property, and

(iii) H has a finite Littlestone dimension.

Private uniform convergence is closely related to the notions of Synthetic Data and Sanitization from the literature of differential privacy. It means that given a labelled sample S as an input, one can privately publish an output sample S’ such that every hypothesis in H has (roughly) the same loss w.r.t to both S and S’. The Littlestone dimension is a combinatorial parameter which can be seen as an adaptive version of the VC dimension.

Based on joint works with Noga Alon, Olivier Bousquet, Roi Livni, and Maryanthe Malliaris.

Noise addition is the most basic technique for guaranteeing differential privacy. However, usually noise is scaled to the "worst-case" or "global" sensitivity of the function being computed and significant efforts are taken to ensure that this is minimized. Nissim, Raskhodnikova, & Smith (STOC 2007) showed that it is possible to add substantially less noise provided that the function has low "local" sensitivity on the realized dataset and all nearby datasets. This talk revisits their "smooth sensitivity" framework and provides both new algorithms within the framework and considers applications to basic problems which can have infinite global sensitivity.

We consider three classes of locally private algorithms: non-interactive algorithms, sequentially interactive algorithms, and fully interactive algorithms. It has long been known that sequentially interactive algorithms can have exponentially smaller sample complexity than non-interactive algorithms for certain problems, but the relationship between sequential interactivity and full interactivity has been much murkier. We define a natural class of mechanisms parameterized by a number k, for which we can show that sequentially interactive algorithms can simulate fully interactive algorithms in this class with an O(k) blowup in sample complexity. We show this is tight by exhibiting a problem for which this O(k) blowup in sample complexity is necessary.

We then define a class of compound hypothesis testing problems (including every simple hypothesis testing problem) for which there always exists a non-interactive hypothesis test that is optimal, even within the class of fully interactive locally private algorithms.

While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of adaptivity---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure post hoc generalization (also called robust or post-selection generalization), which says that, given the output of the algorithm, it is hard to find any statistic for which the data differs significantly from the population it came from.

In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties.

Joint work with Kobbi Nissim, Adam Smith, Thomas Steinke, and Jonathan Ullman.

We introduce a new notion of the stability of computations, and show that the notion is both necessary and sufficient to ensure generalization in the face of adaptivity, for any computations that respond to bounded-sensitivity linear queries while providing accuracy with respect to the data sample set.

Joint work with Katrina Ligett (Hebrew University).