Events
Fall 2021

# Meet the Fellows Welcome Event: Thursday Schedule

Sep 9, 2021 10:00 am – 4:30 pm

 Session 1 Quantum Computing II 10 a.m. – 10:10 a.m. On Query-to-Communication Lifting for Adversary Bounds Anurag Anshu (UC Berkeley) | Abstract We investigate query-to-communication lifting theorems for models related to the quantum adversary bounds. Our results are as follows: 1. We show that the classical adversary bound lifts to a lower bound on randomized communication complexity with a constant-sized gadget. We also show that the classical adversary bound is a strictly stronger lower bound technique than the previously-lifted measure known as critical block sensitivity, making our lifting theorem one of the strongest lifting theorems for randomized communication complexity using a constant-sized gadget. 2. Turning to quantum models, we show a connection between lifting theorems for quantum adversary bounds and secure 2-party quantum computation in a certain "honest-but-curious" model. Under the assumption that such secure 2-party computation is impossible, we show that a simplified version of the positive-weight adversary bound lifts to a quantum communication lower bound using a constant-sized gadget. We also give an unconditional lifting theorem which lower bounds bounded-round quantum communication protocols. 3. Finally, we give some new results in query complexity. We show that the classical adversary and the positive-weight quantum adversary are quadratically related. We also show that the positive-weight quantum adversary is never larger than the square of the approximate degree. Both relations hold even for partial functions. 10:10 a.m. – 10:20 a.m. Towards a Threshold of the Union-Find Decoder Rui Chao (Duke University) | Abstract The surface code is a leading candidate for practical fault tolerance, partially due to its high thresholds under many known efficient decoders. Among these, the Union-Find decoder recently becomes popular due to its simplicity, efficiency, and flexibility. A promising alternative to the surface code is the more general low density parity check (LDPC) quantum codes, which usually boast non-vanishing rate and asymptotically large distance. However, there have been few decoders for quantum LDPC codes with satisfactory empirical performance. In this talk, I will show a straightforward extension of the Union-Find decoder applicable to the LDPC codes. And I will propose a strategy for proving the existence of threshold for some code families. 10:20 a.m. – 10:30 a.m. One-Way Functions Imply Secure Computation in a Quantum World Andrea Coladangelo (UC Berkeley) | Abstract In this talk, I will describe one example of how quantum information can be leveraged to realize cryptographic tasks from weaker assumptions than is possible classically. 10:30 a.m. – 10:40 a.m. Quantum Algorithms for the Mean Estimation Problem Yassine Hamoudi (Simons Institute) | Abstract The problem of estimating the mean of a random variable given i.i.d. samples from it is one of the most basic tasks in statistics and in the Monte Carlo method. Quantum mechanics offers several ways to revisit this question, with new input models and better convergence rates. As an example, we present a series of results culminating in a quantum algorithm that outperforms any classical sub-Gaussian mean estimator. 10:40 a.m. – 10:50 a.m. Quantum Information Meets Cryptography Qipeng Liu (Simons Institute) | Abstract Quantum information can not be generally cloned. There are recent trends on levering the no-cloning theorem to build never-before-possible cryptographic protocols (for example, unclonable encryption and copy-protection programs). In this talk, I will present some of my recent works on copy-protection and give interesting directions on where quantum and cryptography can work together. 10:50 a.m. – 11:30 a.m. Break Session 2 Machine Learning II 11:30 a.m. – 11:40 a.m. Distribution-Free, Risk-Controlling Prediction Sets Stephen Bates (UC Berkeley) | Abstract To enable valid statistical inference in prediction tasks, we show how to generate set-valued predictions with black-box models that control various notions of statistical error. Our approach guarantees that the expected loss on future test points falls below a user-specified level, for any predictive model and underlying distribution. Building on conformal prediction, we use a holdout set to calibrate the size of the prediction sets, generalizing the approach to control error notions such as the false rejection rate. We demonstrate our procedure in four large-scale problems: (1) multi-label classification, where each observation has multiple associated labels; (2) classification problems where the labels have a hierarchical structure; (3) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (4) protein structure prediction. 11:40 a.m. – 11:50 a.m. Learning Under Requirements Luiz Chamon (UC Berkeley) | Abstract The transformative power of learning lies in automating the design of complex systems. Yet, learning today does not incorporate requirements organically, leading to data-driven solutions prone to tampering, unsafe behavior, and biased, prejudiced actions. To overcome these challenges, we must develop learning methods capable of satisfying requirements beyond the training data. I study when and how it is possible to learn under requirements by developing the theoretical underpinnings of constrained learning. Some recent results include defining constrained learning by extending the classical PAC framework to show that constrained learning is essentially as statistically hard as unconstrained learning and developing practical learning rules to tackle constrained learning tasks by solving only unconstrained ERM problems, a duality that holds despite the lack of convexity. These advances enable data-driven solutions to adhere to fairness, robustness, and safety specifications. 11:50 a.m. – 12 p.m. Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS Lin Chen (Simons Institute) | Abstract We prove that the reproducing kernel Hilbert spaces (RKHS) of a deep neural tangent kernel and the Laplace kernel include the same set of functions, when both kernels are restricted to the sphere $\mathbb{S}^{d-1}$. Additionally, we prove that the exponential power kernel with a smaller power (making the kernel less smooth) leads to a larger RKHS, when it is restricted to the sphere $\mathbb{S}^{d-1}$ and when it is defined on the entire $\mathbb{R}^d$. 12 p.m. – 12:10 p.m. Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent Spencer Frei (UC Berkeley) | Abstract In this talk, we propose a unified non-convex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy Polyak-Lojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that stochastic gradient descent (SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities. 12:10 p.m. – 12:20 p.m. On Implicit Regularization in Deep Learning Wei Hu (UC Berkeley) | Abstract A major mystery in the theory of deep learning is the ability of modern deep neural networks to generalize well to unseen data despite significant over-parameterization. Towards explaining this mystery, it has been suggested that the commonly used gradient-based optimization algorithms enforce certain implicit regularization which effectively constrains the model capacity. This talk will cover some examples in which such implicit regularization can be mathematically characterized. 12:20 p.m. – 12:30 p.m. Equilibrium Computation and the Foundations of Multi-Agent Machine Learning Emmanouil Zampetakis (UC Berkeley) | Abstract Deep Learning has recently yielded important advances in single-agent learning challenges, much of that progress being fueled by the empirical success of gradient descent and its variants in computing local optima of non-convex optimization problems. In multi-agent learning applications, the role of single-objective optimization is played by equilibrium computation, yet our understanding of its complexity in settings that are relevant for Deep Learning remains sparse. In this talk we focus on min-max optimization of nonconvex-nonconcave objectives, which has found applications in GANs, and other adversarial learning problems. Here, not only are there no known gradient-descent based methods converging to even local and approximate min-max equilibria, but the computational complexity of identifying them remains poorly understood. We show that finding approximate local min-max equilibria of Lipschitz and smooth objectives requires a number of queries to the function and its gradient that is exponential in the relevant parameters, in sharp contrast to the polynomial number of queries required to find approximate local minima of non-convex objectives. Our oracle lower bound is a byproduct of a complexity-theoretic result showing that finding approximate local min-max equilibria is computationally equivalent to finding Brouwer fixed points, and Nash equilibria in non zero-sum games, and thus PPAD-complete. Based on joint work with Constantinos Daskalakis and Stratis Skoulakis. 12:30 p.m. – 12:40 p.m. Lunch Session 3 Geometrical Methods in Optimization and Sampling 2 p.m. – 2:10 p.m. A Fast Algorithm for Optimal Transport Matt Jacobs (Purdue) | Abstract In this talk I will give a brief introduction to the back-and-forth method, a new algorithm to efficiently solve the optimal transportation problem for a general class of strictly convex transportation costs. Given two probability measures supported on a discrete grid with n points, the method computes the optimal map in O(n log(n)) operations using O(n) storage space. As a result, the method can compute highly accurate solutions to large scale optimal transportation problems. 2:10 p.m. – 2:20 p.m. Approximating Distributions Using Well-Conditioned Normalizing Flows Holden Lee (Duke University) | Abstract Normalizing flows are a probabilistic deep learning model that can generate high-resolution natural images, with the key advantage that likelihoods can be explicitly calculated. Affine-coupling models are a particularly common type of normalizing flows where likelihood computation is practically efficient, taking linear time. Despite their widespread use, their restricted structure makes understanding their representational power challenging. Universal approximation (for sufficiently regular distributions) is known, but require nearly-singular Jacobian, which presents an obstacle for training. In this work, we address the question: which distributions can be approximated using well-conditioned affine coupling flows? We show that any log-concave distribution can be approximated. Our proof leverages connections between underdamped Langevin dynamics (a stochastic differential equation often used to sample from Gibbs measures) and Hénon maps (a structured dynamical system that appears in the study of symplectic diffeomorphisms). Joint work with Chirag Pabbaraju, Anish Sevekari, and Andrej Risteski. 2:20 p.m. – 2:30 p.m. Iterative Methods and High-Dimensional Statistics Kevin Tian (Stanford University) | Abstract I will survey my work at the intersection of iterative methods (borrowing from the traditional toolkits of continuous optimization and algorithm design) and modern problems in learning and high-dimensional statistics. I will primarily focus on telling the tales of two research vignettes. The first is a general-purpose "proximal point reduction" developed to further the algorithmic theory of logconcave sampling, which was used to obtain state-of-the-art rates for sampling continuous distributions with composite and finite-sum structure. The second is a recently developed interplay between fast semidefinite programming and robust statistics in the strong contamination model, which was used e.g. to obtain the first runtime improvement in clustering Gaussian mixture models in nearly 20 years. The talk will aim to be broadly accessible, and will focus on emphasizing the marriage of optimization theory and statistical tools which was used to obtain these recent improvements in high-dimensional settings. 2:30 p.m. – 2:40 p.m. Geometric Methods for Machine Learning and Optimization Melanie Weber (Mathematical Institute, Oxford) | Abstract Many machine learning applications involve non-Euclidean data, such as graphs, strings, or matrices. In such cases, exploiting Riemannian geometry can deliver algorithms that are computationally superior to standard (Euclidean) approaches. This has resulted in an increasing interest in Riemannian methods in the machine learning community. In this talk, I will present two lines of work that utilize Riemannian methods in machine learning. First, we consider the task of learning a robust classifier in hyperbolic space. Such spaces have received a surge of interest for representing large-scale, hierarchical data since they achieve better representation accuracy with fewer dimensions. We also discuss conditions under which such hyperbolic methods are guaranteed to outperform their Euclidean counterparts. Secondly, we introduce Riemannian Frank-Wolfe (RFW) methods for constrained optimization on manifolds. Here, we discuss matrix-valued tasks for which RFW improves on classical Euclidean approaches, including the computation of Riemannian centroids and Wasserstein barycenters. 2:40 p.m. – 2:50 p.m. Optimal Transport for Inverse Problems Yunan Yang (New York University) | Abstract Recently, we proposed the quadratic Wasserstein distance from optimal transport theory for several inverse problems, tackling the classical least-squares method's longstanding difficulties such as nonconvexity and noise sensitivity. The work was soon adopted in the oil industry. As we advance, we discover that the advantage of changing the data misfit is more general in a broader class of data-fitting problems by examining the preconditioning and "implicit" regularization effects of the Wasserstein distance as the objective function in optimization and as the likelihood function in Bayesian inference. Moreover, since the Wasserstein gradient incorporates geometric features, it fits better for the continuous dependence of the data on the parameter for certain inverse problems. 2:50 p.m. – 3 p.m. Graphical Optimal Transport and Its Applications Yongxin Chen (Georgia Tech) | Abstract Multi-marginal optimal transport (MOT) is a generalization of optimal transport theory to settings with possibly more than two marginals. The computation of the solutions to MOT problems has been a longstanding challenge. In this talk, we introduce graphical optimal transport, a special class of MOT problems. We consider MOT problems from a probabilistic graphical model perspective and point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized MOT is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for MOT by leveraging the well-developed algorithms in Bayesian inference. We will cover recent developments of graphical optimal transport in theory and algorithms. We will also go over several applications in aggregate filtering and mean field games. 3 p.m. – 3:10 p.m. Modularity and Edge-Sampling Fiona Skerman (Uppsala University) | Abstract We analyse when community structure of an underlying graph can be determined from an observed subset of the graph. Modularity is a function on graphs which is used in algorithms for community detection. For a given graph G, each partition of the vertices has a modularity score, with higher values indicating that the partition better captures community structure in G. The (max) modularity q*(G) of the graph G is defined to be the maximum over all vertex partitions of the modularity score, and satisfies 0≤q*(G)≤1. In a natural model where we suppose edges in an underlying graph G appear independently with some probability in our observed graph G' we describe how high a sampling probability we need to infer the modularity of the underlying graph from the modularity of the observed graph. Joint work with Colin McDiarmid.