Events
Fall 2021
Meet the Fellows Welcome Event: Thursday Schedule
Sep 9, 2021 10:00 am – 4:30 pm
Location:
Calvin Lab Auditorium
Wednesday, September 8
Schedule Thursday, September 9
Schedule
Thursday, September 9, 2021
Session 1  Quantum Computing II 
10 a.m. – 10:10 a.m. 
On QuerytoCommunication Lifting for Adversary Bounds
Anurag Anshu (UC Berkeley)  Abstract
We investigate querytocommunication lifting theorems for models related to the quantum adversary bounds. Our results are as follows: 1. We show that the classical adversary bound lifts to a lower bound on randomized communication complexity with a constantsized gadget. We also show that the classical adversary bound is a strictly stronger lower bound technique than the previouslylifted measure known as critical block sensitivity, making our lifting theorem one of the strongest lifting theorems for randomized communication complexity using a constantsized gadget. 2. Turning to quantum models, we show a connection between lifting theorems for quantum adversary bounds and secure 2party quantum computation in a certain "honestbutcurious" model. Under the assumption that such secure 2party computation is impossible, we show that a simplified version of the positiveweight adversary bound lifts to a quantum communication lower bound using a constantsized gadget. We also give an unconditional lifting theorem which lower bounds boundedround quantum communication protocols. 3. Finally, we give some new results in query complexity. We show that the classical adversary and the positiveweight quantum adversary are quadratically related. We also show that the positiveweight quantum adversary is never larger than the square of the approximate degree. Both relations hold even for partial functions.

10:10 a.m. – 10:20 a.m. 
Towards a Threshold of the UnionFind Decoder
Rui Chao (Duke University)  Abstract
The surface code is a leading candidate for practical fault tolerance, partially due to its high thresholds under many known efficient decoders. Among these, the UnionFind decoder recently becomes popular due to its simplicity, efficiency, and flexibility. A promising alternative to the surface code is the more general low density parity check (LDPC) quantum codes, which usually boast nonvanishing rate and asymptotically large distance. However, there have been few decoders for quantum LDPC codes with satisfactory empirical performance. In this talk, I will show a straightforward extension of the UnionFind decoder applicable to the LDPC codes. And I will propose a strategy for proving the existence of threshold for some code families.

10:20 a.m. – 10:30 a.m. 
OneWay Functions Imply Secure Computation in a Quantum World
Andrea Coladangelo (UC Berkeley)  Abstract
In this talk, I will describe one example of how quantum information can be leveraged to realize cryptographic tasks from weaker assumptions than is possible classically.

10:30 a.m. – 10:40 a.m. 
Quantum Algorithms for the Mean Estimation Problem
Yassine Hamoudi (Simons Institute)  Abstract
The problem of estimating the mean of a random variable given i.i.d. samples from it is one of the most basic tasks in statistics and in the Monte Carlo method. Quantum mechanics offers several ways to revisit this question, with new input models and better convergence rates. As an example, we present a series of results culminating in a quantum algorithm that outperforms any classical subGaussian mean estimator.

10:40 a.m. – 10:50 a.m. 
Quantum Information Meets Cryptography
Qipeng Liu (Simons Institute)  Abstract
Quantum information can not be generally cloned. There are recent trends on levering the nocloning theorem to build neverbeforepossible cryptographic protocols (for example, unclonable encryption and copyprotection programs). In this talk, I will present some of my recent works on copyprotection and give interesting directions on where quantum and cryptography can work together.

10:50 a.m. – 11:30 a.m. 
Break

Session 2  Machine Learning II 
11:30 a.m. – 11:40 a.m. 
DistributionFree, RiskControlling Prediction Sets
Stephen Bates (UC Berkeley)  Abstract
To enable valid statistical inference in prediction tasks, we show how to generate setvalued predictions with blackbox models that control various notions of statistical error. Our approach guarantees that the expected loss on future test points falls below a userspecified level, for any predictive model and underlying distribution. Building on conformal prediction, we use a holdout set to calibrate the size of the prediction sets, generalizing the approach to control error notions such as the false rejection rate. We demonstrate our procedure in four largescale problems: (1) multilabel classification, where each observation has multiple associated labels; (2) classification problems where the labels have a hierarchical structure; (3) image segmentation, where we wish to predict a set of pixels containing an object of interest; and (4) protein structure prediction.

11:40 a.m. – 11:50 a.m. 
Learning Under Requirements
Luiz Chamon (UC Berkeley)  Abstract
The transformative power of learning lies in automating the design of complex systems. Yet, learning today does not incorporate requirements organically, leading to datadriven solutions prone to tampering, unsafe behavior, and biased, prejudiced actions. To overcome these challenges, we must develop learning methods capable of satisfying requirements beyond the training data. I study when and how it is possible to learn under requirements by developing the theoretical underpinnings of constrained learning. Some recent results include defining constrained learning by extending the classical PAC framework to show that constrained learning is essentially as statistically hard as unconstrained learning and developing practical learning rules to tackle constrained learning tasks by solving only unconstrained ERM problems, a duality that holds despite the lack of convexity. These advances enable datadriven solutions to adhere to fairness, robustness, and safety specifications.

11:50 a.m. – 12 p.m. 
Deep Neural Tangent Kernel and Laplace Kernel Have the Same RKHS
Lin Chen (Simons Institute)  Abstract
We prove that the reproducing kernel Hilbert spaces (RKHS) of a deep neural tangent kernel and the Laplace kernel include the same set of functions, when both kernels are restricted to the sphere $\mathbb{S}^{d1}$. Additionally, we prove that the exponential power kernel with a smaller power (making the kernel less smooth) leads to a larger RKHS, when it is restricted to the sphere $\mathbb{S}^{d1}$ and when it is defined on the entire $\mathbb{R}^d$.

12 p.m. – 12:10 p.m. 
Proxy Convexity: A Unified Framework for the Analysis of Neural Networks Trained by Gradient Descent
Spencer Frei (UC Berkeley)  Abstract
In this talk, we propose a unified nonconvex optimization framework for the analysis of neural network training. We introduce the notions of proxy convexity and proxy PolyakLojasiewicz (PL) inequalities, which are satisfied if the original objective function induces a proxy objective function that is implicitly minimized when using gradient methods. We show that stochastic gradient descent (SGD) on objectives satisfying proxy convexity or the proxy PL inequality leads to efficient guarantees for proxy objective functions. We further show that many existing guarantees for neural networks trained by gradient descent can be unified through proxy convexity and proxy PL inequalities.

12:10 p.m. – 12:20 p.m. 
On Implicit Regularization in Deep Learning
Wei Hu (UC Berkeley)  Abstract
A major mystery in the theory of deep learning is the ability of modern deep neural networks to generalize well to unseen data despite significant overparameterization. Towards explaining this mystery, it has been suggested that the commonly used gradientbased optimization algorithms enforce certain implicit regularization which effectively constrains the model capacity. This talk will cover some examples in which such implicit regularization can be mathematically characterized.

12:20 p.m. – 12:30 p.m. 
Equilibrium Computation and the Foundations of MultiAgent Machine Learning
Emmanouil Zampetakis (UC Berkeley)  Abstract
Deep Learning has recently yielded important advances in singleagent learning challenges, much of that progress being fueled by the empirical success of gradient descent and its variants in computing local optima of nonconvex optimization problems. In multiagent learning applications, the role of singleobjective optimization is played by equilibrium computation, yet our understanding of its complexity in settings that are relevant for Deep Learning remains sparse. In this talk we focus on minmax optimization of nonconvexnonconcave objectives, which has found applications in GANs, and other adversarial learning problems. Here, not only are there no known gradientdescent based methods converging to even local and approximate minmax equilibria, but the computational complexity of identifying them remains poorly understood. We show that finding approximate local minmax equilibria of Lipschitz and smooth objectives requires a number of queries to the function and its gradient that is exponential in the relevant parameters, in sharp contrast to the polynomial number of queries required to find approximate local minima of nonconvex objectives. Our oracle lower bound is a byproduct of a complexitytheoretic result showing that finding approximate local minmax equilibria is computationally equivalent to finding Brouwer fixed points, and Nash equilibria in non zerosum games, and thus PPADcomplete. Based on joint work with Constantinos Daskalakis and Stratis Skoulakis.

12:30 p.m. – 12:40 p.m. 
Lunch

Session 3  Geometrical Methods in Optimization and Sampling 
2 p.m. – 2:10 p.m. 
A Fast Algorithm for Optimal Transport
Matt Jacobs (Purdue)  Abstract
In this talk I will give a brief introduction to the backandforth method, a new algorithm to efficiently solve the optimal transportation problem for a general class of strictly convex transportation costs. Given two probability measures supported on a discrete grid with n points, the method computes the optimal map in O(n log(n)) operations using O(n) storage space. As a result, the method can compute highly accurate solutions to large scale optimal transportation problems.

2:10 p.m. – 2:20 p.m. 
Approximating Distributions Using WellConditioned Normalizing Flows
Holden Lee (Duke University)  Abstract
Normalizing flows are a probabilistic deep learning model that can generate highresolution natural images, with the key advantage that likelihoods can be explicitly calculated. Affinecoupling models are a particularly common type of normalizing flows where likelihood computation is practically efficient, taking linear time. Despite their widespread use, their restricted structure makes understanding their representational power challenging. Universal approximation (for sufficiently regular distributions) is known, but require nearlysingular Jacobian, which presents an obstacle for training. In this work, we address the question: which distributions can be approximated using wellconditioned affine coupling flows? We show that any logconcave distribution can be approximated. Our proof leverages connections between underdamped Langevin dynamics (a stochastic differential equation often used to sample from Gibbs measures) and Hénon maps (a structured dynamical system that appears in the study of symplectic diffeomorphisms). Joint work with Chirag Pabbaraju, Anish Sevekari, and Andrej Risteski.

2:20 p.m. – 2:30 p.m. 
Iterative Methods and HighDimensional Statistics
Kevin Tian (Stanford University)  Abstract
I will survey my work at the intersection of iterative methods (borrowing from the traditional toolkits of continuous optimization and algorithm design) and modern problems in learning and highdimensional statistics. I will primarily focus on telling the tales of two research vignettes. The first is a generalpurpose "proximal point reduction" developed to further the algorithmic theory of logconcave sampling, which was used to obtain stateoftheart rates for sampling continuous distributions with composite and finitesum structure. The second is a recently developed interplay between fast semidefinite programming and robust statistics in the strong contamination model, which was used e.g. to obtain the first runtime improvement in clustering Gaussian mixture models in nearly 20 years. The talk will aim to be broadly accessible, and will focus on emphasizing the marriage of optimization theory and statistical tools which was used to obtain these recent improvements in highdimensional settings.

2:30 p.m. – 2:40 p.m. 
Geometric Methods for Machine Learning and Optimization
Melanie Weber (Mathematical Institute, Oxford)  Abstract
Many machine learning applications involve nonEuclidean data, such as graphs, strings, or matrices. In such cases, exploiting Riemannian geometry can deliver algorithms that are computationally superior to standard (Euclidean) approaches. This has resulted in an increasing interest in Riemannian methods in the machine learning community. In this talk, I will present two lines of work that utilize Riemannian methods in machine learning. First, we consider the task of learning a robust classifier in hyperbolic space. Such spaces have received a surge of interest for representing largescale, hierarchical data since they achieve better representation accuracy with fewer dimensions. We also discuss conditions under which such hyperbolic methods are guaranteed to outperform their Euclidean counterparts. Secondly, we introduce Riemannian FrankWolfe (RFW) methods for constrained optimization on manifolds. Here, we discuss matrixvalued tasks for which RFW improves on classical Euclidean approaches, including the computation of Riemannian centroids and Wasserstein barycenters.

2:40 p.m. – 2:50 p.m. 
Optimal Transport for Inverse Problems
Yunan Yang (New York University)  Abstract
Recently, we proposed the quadratic Wasserstein distance from optimal transport theory for several inverse problems, tackling the classical leastsquares method's longstanding difficulties such as nonconvexity and noise sensitivity. The work was soon adopted in the oil industry. As we advance, we discover that the advantage of changing the data misfit is more general in a broader class of datafitting problems by examining the preconditioning and "implicit" regularization effects of the Wasserstein distance as the objective function in optimization and as the likelihood function in Bayesian inference. Moreover, since the Wasserstein gradient incorporates geometric features, it fits better for the continuous dependence of the data on the parameter for certain inverse problems.

2:50 p.m. – 3 p.m. 
Graphical Optimal Transport and Its Applications
Yongxin Chen (Georgia Tech)  Abstract
Multimarginal optimal transport (MOT) is a generalization of optimal transport theory to settings with possibly more than two marginals. The computation of the solutions to MOT problems has been a longstanding challenge. In this talk, we introduce graphical optimal transport, a special class of MOT problems. We consider MOT problems from a probabilistic graphical model perspective and point out an elegant connection between the two when the underlying cost for optimal transport allows a graph structure. In particular, an entropy regularized MOT is equivalent to a Bayesian marginal inference problem for probabilistic graphical models with the additional requirement that some of the marginal distributions are specified. This relation on the one hand extends the optimal transport as well as the probabilistic graphical model theories, and on the other hand leads to fast algorithms for MOT by leveraging the welldeveloped algorithms in Bayesian inference. We will cover recent developments of graphical optimal transport in theory and algorithms. We will also go over several applications in aggregate filtering and mean field games.

3 p.m. – 3:10 p.m. 
Modularity and EdgeSampling
Fiona Skerman (Uppsala University)  Abstract
We analyse when community structure of an underlying graph can be determined from an observed subset of the graph. Modularity is a function on graphs which is used in algorithms for community detection. For a given graph G, each partition of the vertices has a modularity score, with higher values indicating that the partition better captures community structure in G. The (max) modularity q*(G) of the graph G is defined to be the maximum over all vertex partitions of the modularity score, and satisfies 0≤q*(G)≤1. In a natural model where we suppose edges in an underlying graph G appear independently with some probability in our observed graph G' we describe how high a sampling probability we need to infer the modularity of the underlying graph from the modularity of the observed graph. Joint work with Colin McDiarmid.
