No abstract available.
Monday, May 6th, 2019
Data collected from individuals is the fuel that drives the modern AI revolution. Intense recent discussions have focused on how to provide individuals control over when their data can and cannot be used---the EU's Right To Be Forgotten regulation is an example of this effort. Here we initiate a framework to study what to do when it is no longer permissible to deploy models derivative from certain individual user data. In particular, we formulate the problem of how to efficiently delete individual data from ML models that have been trained using this data. For many standard ML models, the only way to completely remove a person's data is to retrain the whole model from scratch on the remaining data. This is not practically feasible in many settings and we investigate ML models where data can be efficiently deleted
We propose a Bayesian approach to measure loss of privacy and apply it to the design of optimal mechanisms. The privacy loss associated with a mechanism is defined as the difference between the designer's prior and posterior beliefs about an agent's type, where this difference is calculated using Kullback-Leibler divergence, and where the change in beliefs is triggered by the agent's actions in the mechanism. We consider both ex-ante and ex-post (with respect to the type realizations) measures of privacy loss and apply them to study the properties of optimal privacy-constrained mechanisms and the relation between welfare/profits and privacy levels.
No abstract available.
No abstract available.
Tuesday, May 7th, 2019
Technological developments often serve as catalysts for new fields of study. Powered by data and learning techniques, algorithms are now implementing balances among societal values that were traditionally addressed by legal doctrines. This development has already motivated successful research on differential privacy and more recently algorithmic fairness. Is it time for a broader research effort on algorithms and law? The goal of this talk is to facilitate a discussion on what a joint research field involving algorithm designers and legal scholars could possibly look like. We focus in particular on algorithm design rather than computer science as whole, leaving “computational law” topics such as legal text mining out of scope for this talk.
At a very high level, an algorithm is not far from a legal doctrine: an algorithm gets input and makes a decision, with the goal of optimizing some objective subject to constraints; similarly, a legal doctrine is applied to decide cases by balancing among different objectives and constraints. We cover two recent examples of algorithms required to implement legal doctrines (beyond privacy and fairness): (1) Autonomous driving and duty of care; (2) Algorithmic pricing and antitrust. In the opposite direction, we discuss how personalization of law is bringing legal doctrines closer to algorithms that take input details into account. We close by raising perhaps the biggest challenge to an interdisciplinary research field of algorithms and law – bridging between the mathematical and text-based approaches.
As algorithms are increasingly used to make important decisions pertaining to individuals, algorithmic discrimination is becoming a prominent concern. The seminal work of Dwork et al. [ITCS 2012] introduced the notion of individual fairness (IF): given a task-specific similarity metric, every pair of similar individuals should receive similar outcomes. In this work, we study fairness when individuals have diverse preferences over the possible outcomes. We show that in such settings, individual fairness can be too restrictive: requiring individual fairness can lead to less-preferred outcomes for the very individuals that IF aims to protect (e.g. a protected minority group).
We introduce and study a new notion of preference-informed individual fairness (PIIF), a relaxation of individual fairness that allows for outcomes that deviate from IF, provided the deviations are in line with individuals' preferences. We show that PIIF can allow for solutions that are considerably more beneficial to individuals than the best IF solution. Motivated by fairness concerns in targeted advertising, we apply this new fairness notion to the multiple-task setting introduced by Dwork and Ilvento [ITCS 2019]. Finally, we demonstrate that in both settings, any convex objective over the outcomes subject to PIIF can be efficiently optimized for a rich class of individual preferences.
In this talk, we investigate the role of information in *fair* prediction. A common strategy for decision-making uses a predictor to assign individuals a risk score; then, individuals are selected or rejected on the basis of this score. We formalize a framework for measuring the information content of such predictors, showing that increasing information content through a certain kind of "refinement" improves the downstream selection rules across a wide range of fairness measures (e.g. true positive rates, false positive rates, selection rates). In turn, refinements provide a simple but effective tool for reducing disparity in treatment and impact without sacrificing the utility of the predictions. Our results suggest that in many applications, the perceived “cost of fairness” results from an information disparity across populations, and thus, may be avoided with improved information. We conclude by discussing how our information-theoretic perspective on fairness may shed new light on the feasibility of simultaneously *fair and private* predictions. Based on joint work with Sumegha Garg and Omer Reingold.
No abstract available.
Wednesday, May 8th, 2019
Modern machine learning models exhibit amazing accuracy on tasks from image classification to natural-language processing, but accuracy does not tell the entire story of what these models have learned. Does a model memorize and leak its training data? Does it “accidentally" learn representations and tasks uncorrelated with its training objective? Do techniques for censoring representations and adversarial regularization prevent unwanted learning? Is it possible to train "least-privilege” models that learn only the task they were given? I will demonstrate unwanted-learning phenomena in several practical models and discuss why obvious solutions do not appear to work.
Hardware as a root of trust presents interesting - and intriguing - alternative to secure multi-party computations. We survey several approaches to hardware-based security, such as hardware tokens, secure elements, and secure enclaves, with application towards privacy amplification via shuffling. We conclude with open research directions and problems.
Differential privacy has seen remarkable success in the past decade. But it also has some well known weaknesses: notably, it does not tightly handle composition. This weakness has inspired several recent relaxations of differential privacy based on Renyi divergences. We propose an alternative relaxation of differential privacy, which we term "f-DP", which has a number of nice properties and avoids some of the difficulties associated with divergence based relaxations. First, it preserves the hypothesis testing interpretation of differential privacy, which makes its guarantees easily interpretable. It allows for lossless reasoning about composition and post-processing, and notably, a direct way to analyze privacy amplification by subsampling. We define a canonical single parameter family of definitions within our class is termed "Gaussian Differential Privacy", based on the hypothesis testing region defined by two Gaussian distributions. We show that this family is focal by proving a central limit theorem, which shows that the privacy guarantees of -any- hypothesis-testing based definition of privacy (including differential privacy) converges to Gaussian differential privacy in the limit under composition. This central limit theorem also gives a tractable analysis tool. We demonstrate the use of the tools we develop by giving an improved analysis of the privacy guarantees of noisy stochastic gradient descent. Based on joint work with Aaron Roth and Weijie Su.
In the 90s, software engineering shifted from packaged software and PCs to services and clouds, enabling distributed architectures that incorporate real-time feedback from users. In the process, digital systems became layers of technologies metricized under the authority of objective functions. These functions drive selection of software features, service integration, cloud usage, user interaction and growth, customer service, and environmental capture, among others. Whereas information systems focused on storage, processing and transport of information, and organizing knowledge "with associated risks of surveillance" contemporary systems leverage the knowledge they gather to not only understand the world, but also to optimize it, seeking maximum extraction of economic value through the capture and manipulation of people's activities and environments. The ability of these optimization systems to treat the world not as a static place to be known, but as one to sense and co-create, poses social risks and harms such as social sorting, mass manipulation, asymmetrical concentration of resources, majority dominance, and minority erasure. In the vocabulary of optimization, these harms arise due to choosing inadequate objective functions. During the talk, I will provide an account of what we mean by optimization systems, detail their externalities and make a proposition for Protective Optimization Technologies.
No abstract available.
Thursday, May 9th, 2019
Multiple hypothesis testing, a situation when we wish to consider many hypotheses, is a core problem in statistical inference that arises in almost every scientific field. In this setting, controlling the false discovery rate (FDR), which is the expected proportion of type I error, is an important challenge for making meaningful inferences. In this talk, we consider a setting where an ordered (possibly infinite) sequence of hypotheses arrives in a stream, and for each hypothesis we observe a p-value along with a set of features specific to that hypothesis. The decision whether or not to reject the current hypothesis must be made immediately at each timestep, before the next hypothesis is observed. We propose a new class of powerful online testing procedures, where the rejection thresholds are learned sequentially by incorporating contextual information and previous results. Any rule in this class controls online FDR under some standard assumptions. We will also discuss how our proposed procedures would lead to an increase of statistical power over a popular online testing procedure.
The emerging marketplace for online free services in which service providers (SPs) earn revenue from using consumer data in direct and indirect ways has led to significant privacy concerns. This begs understanding of the following question: is there a sustainable market for multiple SPs that offer privacy differentiated free services? This paper studies the impact of privacy on free online service markets by augmenting the classical Hotelling model for market segmentation analysis. A parametrized game-theoretic model is proposed which captures: (i) the fact that for the free service market, consumers value service not in monetized terms but by the quality of service (QoS); (ii) the differentiator of services is not product price but the privacy risk advertised by an SP; and (iii) consumer’s heterogeneous privacy preference for SPs. For the two-SP problem with uniformly distributed consumer privacy preference and linear SP profit function, the results suggest that: (i) when consumers place a higher value on privacy, it leads to a larger market share for the SP providing untargeted services and a “softened" competition between SPs; (ii) SPs offering high privacy risk services are sustainable only if they offer sufficiently high QoS; and (iii) SPs that are capable of differentiating on services that do not directly use consumer data gain larger market share. Similar results are observed when the consumer’s privacy preference is modeled as a truncated Gaussian distribution.
The General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) are the most consequential developments in information policy in a generation. In this talk, I will explain their strategic goals and the most interesting mechanisms these instruments use to create incentives and disincentives, to structurally strengthen some relationships while disadvantaging others, and to create privacy markets.
No abstract available.
Friday, May 10th, 2019
The question that interests me here is whether human values, such as privacy or non-discrimination, can be translated into computational metrics and how such translation affects the protection offered by text-driven norms (such as law). This question raises the issue of how the assumptions of computer science, notably those concerning computability, interact with the assumptions of human language and text, notably those concerning the fundamental ambiguity of meaning. This, in turn, relates to the issue of temporality, that seems to play a different role in text and computation; whereas a machine learning algorithm cannot be trained on future data, human language seems inherently focused on integrating anticipation and remembrance. The latter should not be reduced to a prediction based on mathematical extrapolation of historical data, but rather in terms of imagination and its constitutive impact on the world we share (understood in terms of institutional facts). This will, finally, lead me to a juxtaposition between the affordances of computational systems and those of text, arguing that both generate different types of normativity.
This talk will consider a paradigm where the public generally own and control their data. Here, mechanisms for purchasing data could create huge economic value (and share it fairly). But putting this into practice requires assigning value to data and efficiently matching the mechanism to providers. I will outline some research on this problem utilizing ideas from prediction markets, active learning, and differential privacy. We will see a few important directions for future work. I may also ramble briefly regarding software freedom, labor versus capital, etc.
The literature on fairness in machine learning has yet to settle on definitions. At a high level, there are two families of fairness definitions. "Individual" definitions of fairness ask for semantically meaningful constraints that bind at the level of individuals. When these constraints are defined with respect to the unknown labels, they are often impossible to satisfy. "Statistical" definitions of fairness ask for equality of some error metric (like false positive rate) evaluated over "protected" populations. These are easy to check and satisfy, but don't provide guarantees to individuals.
We show that some of the benefits of both definitions can be obtained if one posits a distribution on problems, in addition to a distribution on datapoints. For example, one may ask that the error rates of false positive rates be equalized between every pair of individuals, where now the "rate" is an average over problems, and not an average over people. This now corresponds to a meaningful promise to individuals. We give oracle-efficient algorithms for optimizing classification error subject to this constraint, and prove generalization bounds over -both- distributions (so that we can make promises both to new individuals and over new problems).
Joint work with Michael Kearns and Saeed Sharifimalvajerdi.
Accountability is used often in describing computer-security mechanisms that complement preventive security, but it lacks a precise, agreed-upon definition. Here, we argue for the need for accountability in computing in a variety of settings, categorize some of the many ways in which this term is used, and propose a punishment-focused view of "accountability." We formalize our view in a utility-theoretic way and then use this to reason about accountability in computing systems. We also survey mechanisms providing various senses of accountability as well as other approaches to reasoning about accountability-related
This is joint work with Joan Feigenbaum and Aaron Jaggard.