The maximum flow problem is one of the most fundamental and well-studied problems in continuous and combinatorial optimization. Over the past 5 years there have been a series of results which showed how to careful combine new combinatorial graph decompositions with new iterative continuous optimization methods to approximately solve the problem on undirected graphs in nearly linear time. In this talk I will show how to develop simple coordinate descent based methods that match the state-of-the-art running times for this problems and in various settings improve upon it. Moreover, I will discuss how this yields an alternative to the recent breakthrough result of Sherman (2017) for solving ℓ∞-regression and leads to faster exact maximum flow algorithms in various settings. This talk is based on joint work with Kevin Tian available on arXiv at https://arxiv.org/abs/1808.01278.

### Monday, December 10th, 2018

We present a new general and simple method for rounding semi-definite programs, based on Brownian motion. Our approach is inspired by recent results in algorithmic discrepancy theory. We develop and present tools for analyzing our new rounding algorithms, utilizing mathematical machinery from the theory of Brownian motion, complex analysis, and partial differential equations. Focusing on constraint satisfaction problems, we apply our method to several classical problems, including Max Cut, Max 2-SAT, and Max DiCut, and derive new algorithms that are competitive with the best known results. We further show the versatility of our approach by presenting simple and natural variants of it, and we numerically demonstrate that they exhibit nearly optimal approximation guarantees for some problems.

Conventional wisdom asserts that linear programs (LPs) perform poorly for max-cut and other constraint satisfaction problems, and that semidefinite programming and spectral methods are needed to give nontrivial bounds. In this talk, I will describe a recent result that stands in contrast to this wisdom: we show that surprisingly small LPs can give nontrivial upper bounds for constraint satisfaction problems. For example, in a random $\Delta$-regular $n$-vertex graph, the $\exp(c \frac{\log n}{\log \Delta})$-round Sherali--Adams LP certifies that the max cut has value at most $50.1\%$. In random graphs with $n^{1.01}$ edges, $O(1)$ rounds suffice; in random graphs with $n \cdot \log(n)$ edges, $n^{O(1/\log \log n)} = n^{o(1)}$ rounds suffice.

We present a blended conditional gradient algorithm for minimizing a smooth convex function over a polytope P, that combines gradient projection steps with conditional gradient steps, achieving linear convergence for strongly convex functions. It does not make use of away steps or pairwise steps, but retains all favorable properties of conditional gradient algorithms, most notably not requiring projections onto P and maintaining iterates as sparse convex combinations of extreme points. The algorithm decreases measures of optimality (primal and dual gaps) rapidly, both in the number of iterations and in wall-clock time, outperforming even the efficient lazified conditional gradient algorithms of Braun et al. [2017]. We also present a streamlined algorithm for the special case in which P is a probability simplex, called simplex gradient descent.

Randomized rounding and iterated rounding are perhaps the two most commonly used techniques in approximation algorithms. However, they are based on very different approaches: one is probabilistic and the other is linear algebraic. I will describe a new rounding procedure that unifies both these methods, and significantly extends the reach of currently known dependent-rounding methods. In the talk, we will see how this leads to various new results for several classic problems such as degree-bounded spanning trees, rounding column sparse linear programs and load-balancing on unrelated machines.

Fair allocation of resources has deep roots in early philosophy, and has been broadly studied in political science, economic theory, operations research, and networking. Over the past decades, an axiomatic approach to fair resource allocation has led to the general acceptance of a class of alpha-fair utility functions parametrized by a single non-negative inequality aversion parameter alpha. In theoretical computer science, the most well-studied examples are linear utilities (alpha = 0), proportionally fair or Nash utilities (alpha = 1), and max-min fair utilities (alpha = infinity). Maximization of alpha-fair utility functions over convex sets leads to alpha-fair allocation vectors, whose fairness properties are well-studied and axiomatically justified. For most applications, a natural feasible set is a packing polytope, and we refer to the corresponding problem as the alpha-fair packing. The special case of alpha = 0 is known as the packing linear program, which has been widely studied in theoretical computer science due to the phenomenon of width-independence. Namely, for packing linear programs it is possible to obtain first-order methods whose running times scale poly-logarithmically with the constraint matrix width -- the maximum ratio of its non-zero elements. By contrast, this is not possible for general linear programming. In this talk, I will present distributed and width-independent first-order methods for solving general alpha-fair packing problems and their minimization counterparts -- alpha-fair covering problems. The convergence times (number of distributed iterations) of these algorithms scale as Õ(1/eps^2) and Õ(1/eps), which greatly improves upon Õ(1/eps^5) and Õ(1/eps^4) previously known only for the alpha-fair packing problems. I will further discuss how special, local, structure of these problems relates to the phenomenon of width-independence, and conclude with a few open problems. Joint work with Maryam Fazel and Lorenzo Orecchia.

We consider a parametric family of polynomial optimization problems over algebraic sets. Although these problems are typically nonconvex, tractable convex relaxations via semidefinite programming (SDP) have been proposed. Often times in applications there is a natural value of the parameters for which the relaxation will solve the problem exactly. We study conditions (and quantitative bounds) under which the relaxation will continue to be exact as the parameter moves in a neighborhood of the original one. Our framework captures several estimation problems such as low rank approximation, camera triangulation, rotation synchronization, approximate matrix completion and approximate GCD. In these applications, a solution is easy under noiseless observations, and our results guarantee that the SDP relaxation will continue to solve the problem in the low-noise regime. Joint work with Diego Cifuentes (MIT), Sameer Agarwal (Google), and Rekha Thomas (U. Washington).

We discuss an impossibility result (Hochbaum 1994) that establishes that there is no strongly polynomial algorithm for any non-linear, non-quadratic, optimization problem. One implication of the result is that the complexity of "solving" continuous problems is only defined for discrete algorithms. We illustrate the use of this result in two contexts. One is for the Markov Random Fields in image segmentation, and the other is the statistical model of isotonic regression and some of its generalizations. For these problems we will describe the best possible polynomial time algorithms which are based on combinatorial optimization.

### Tuesday, December 11th, 2018

The primal-dual schema has been a powerful tool in online optimization and competitive analysis. In a joint project with Bubeck, Cohen, Y. T. Lee, and Madry, we recast some of those methods in the language of entropic regularization and mirror descent. This perspective led to advances in competitive analysis, notably for the infamous k-server problem. While driven by a compelling underlying philosophy, some aspects of the resulting algorithms and analysis are still ad-hoc. I will present joint work with Christian Coester on the first "canonical" instantiation of this method. This is an O(log n)-competitive algorithm for the Metrical Task Systems problem on any HST (hierarchically separated tree), where the underlying algorithm is simply mirror descent on the probability simplex. The algorithm has an appealing local interpretation, where agents sitting at nodes of the HST independently and concurrently modify the marginal probabilities according to a simple update rule. Moreover, it achieves optimal bicriteria "refined" guarantees (in the language of Bansal, Buchbindrer, and Naor).

We consider the $k$-server problem on trees and HSTs. We give an algorithm based on Bregman projections, whose competitive ratio matches those in recent results of Bubeck et al. (STOC 2018). Joint work with Niv Buchbinder, Marco Molinaro, and Seffi Naor.

We would like to briefly describe new potential applications of the Moment-SOS hierarchy (e.g. in computational geometry, in solving some non linear PDE's, and some inverse problems in optimal control (with applications in Humanoid Robotics). Also I would like to introduce the Christoffel function (a well-known tool in approximation theory) which can have potential application in some data analysis problem (outlier detection, density estimation, dimension learning). It is simple tool, easy to use and with no optimization involved.

A selector maps a set (in some set system) to an element in that set. In a metric space, Lipschitz selection is the problem of finding a selector that is Lipschitz with respect to the Hausdorff distance. A classical result is the existence of a Lipschitz selector for convex sets in Euclidean space. In this talk we will prove an *online* version of this classical result. This resolves the 1991 Friedman-Linial conjecture on convex body chasing.

Joint work with Yin Tat Lee, Yuanzhi Li, and Mark Sellke.

This talk focuses on the statistical sample complexity and model reduction of Markov decision process (MDP). We begin by surveying recent advances on the complexity for solving MDP, without any dimension reduction. In the first part we study the statistical state compression of general Markov processes. We propose a spectral state compression method for learning state features and aggregation structures from data. The state compression method is able to “sketch” a black-box Markov process from its empirical data, for which both minimax statistical guarantees and scalable computational tools are provided. In the second part, we propose a bilinear primal-dual pi learning method for finding the optimal policy, which utilizes given state and action features. The method is motivated from a saddle point formulation of the Bellman equation. Its sample complexity depends only on the number of parameters and is variant with respect to the dimension of the problem, making high-dimensional reinforcement learning possible using “small” data.

We consider minimization of a smooth nonconvex objective function using an iterative algorithm based on Newton's method and linear conjugate gradient, with explicit detection and use of negative curvature directions for the Hessian of the objective function. The algorithm closely tracks Newton-conjugate gradient procedures developed in the 1980s, but includes enhancements that allow worst-case complexity results to be proved for convergence to points that satisfy approximate first-order and second-order optimality conditions. Knowledge of such parameters as a bound on the Hessian norm is not required. The complexity results match the best known results in the literature for second-order methods. The talk represents joint work with Clement Royer and Michael O'Neill.

We consider the reinforcement-learning problem of computing an eps-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through samples. Given such a DMDP with states S, actions A, discount factor gamma in (0,1), and rewards in range [0,1] we provide an algorithm which computes an eps-optimal policy with high probability where both the time spent and the number of samples taken are upper bounded by ~ |S||A|(1-gamma)^{-3}eps^{-2}. This improves upon the previously best-known bounds by a factor of (1-gamma)^-1 and matches the information theoretical lower bound up to logarithmic factors.

### Wednesday, December 12th, 2018

In understanding physical systems over hundreds of years, physicists have developed a wealth of dynamics and viewpoints. Some of these methods, when abstracted appropriately, could lead to new algorithmic techniques with applications to optimization, machine learning, and theoretical computer science. I will present a couple of recent examples from my own research on such interactions between Physics and Algorithms.

Given labeled points in a high-dimensional vector space, we seek a low-dimensional subspace such that projecting onto this subspace maintains some prescribed distance between points of differing labels. Intended applications include compressive classification. Taking inspiration from large margin nearest neighbor classification, this paper introduces a semidefinite relaxation of this problem. Unlike its predecessors, this relaxation is amenable to theoretical analysis, allowing us to provably recover a planted projection operator from the data. This is joint work with Dustin Mixon and Culver McWhirter.

The original simplicial method (OSM), a variant of the classic Kelley's cutting plane method, has been shown to converge to the minimizer of a composite convex (g) and submodular (f) objectives g+f, though no rate of convergence for this method was known. Moreover, OSM is required to solve subproblems in each iteration whose size grows linearly in the number of iterations. We propose a limited memory version of Kelley's method (L-KM) and of OSM that requires limited memory (at most n+1 constraints for an n-dimensional problem) independent of the iteration. We prove convergence for L-KM when the convex part (g) of the objective is strongly convex and show it converges linearly when it is also smooth. Our analysis relies on duality between minimization of the composite objectives and the minimization of a convex function over the submodular base polytope. We introduce a limited memory version, L-FCFW, of the Fully-Corrective Frank-Wolfe method with approximate correction, to solve the dual problem. We show that L-FCFW and L-KM are dual algorithms that produce the same sequence of iterates; hence both converge linearly (when g is smooth and strongly convex) and with limited memory. Our results on L-FCFW hold for general polytopes and may be of independent interest.

In this talk, I will explain how to solve linear programs with accuracy epsilon in time n^{omega+o(1)} log(1/epsilon) where omega~2.3728639 is the current matrix multiplication constant. This hits a natural barrier of solving linear programs since linear systems is a special case of linear programs and solving linear systems require time n^omega currently. Joint work with Michael Cohen and Zhao Song.

In the classical secretary problem, a sequence of n elements arrive in a uniformly random order. The goal is to maximize the probability of selecting the largest element (or to maximize the expected value of the selected item). This model captures applications like online auctions, where we want to select the highest bidder. In many such applications, however, one may expect a few outlier arrivals that are adversarially placed in the arrival sequence. Can we still select a large element with good probability? Dynkin’s popular 1/e-secretary algorithm is sensitive to even a single adversarial arrival: if the adversary gives one large bid at the beginning of the stream, the algorithm does not select any element at all. In this work we introduce the Byzantine Secretary problem where we have two kinds of elements: green (good) and red (bad). The green elements arrive uniformly at random. The reds arrive adversarially. The goal is to find a large green element. It is easy to see that selecting the largest green element is not possible even when a small fraction of the arrival is red, i.e., we cannot do much better than random guessing. Hence we introduce the second-max benchmark, where the goal is to select the second-largest green element or better. This dramatically improves our results. We study both the probability-maximization and the value-maximization settings. For probability-maximization, we show the existence of a good randomized algorithm, using the minimax principle. Specifically, we give an algorithm for the known distribution case, based on trying to guess the second-max in hindsight, and using this estimate as a good guess for the future. For value-maximization, we give an explicit poly log^∗ n competitive algorithm, using a multi-layered bucketing scheme that adaptively refines our estimates of second-max as we see more elements. For the multiple secretary problem, where we can pick up to r secretaries, we show constant competitiveness as long as r is large enough. For this, we give an adaptive thresholding algorithm that raises and lowers thresholds depending on the quality and the quantity of recently selected elements. This is joint work with Domagoj Bradac, Anupam Gupta, and Goran Zuzic.

In the Directed Steiner Tree (DST) problem we are given an n-vertex directed edge-weighted graph, a root r, and a collection of k terminal nodes. Our goal is to find a minimum-cost arborescence that contains a directed path from r to every terminal. We present an O(log^2 k/log log{k})-approximation algorithm for DST that runs in quasi-polynomial-time. By adjusting the parameters in the hardness result of Halperin and Krauthgamer [STOC'03], we show the matching lower bound of Omega(log^2{k}/log log{k}) for the class of quasi-polynomial-time algorithms. This is the first improvement on the DST problem since the classical quasi-polynomial-time O(log^3 k) approximation algorithm by Charikar et al. [SODA'98; J. Algorithms'99] (The paper erroneously claims an O(log^2k) approximation due to a mistake in prior work.) Our approach is based on two main ingredients. First, we derive an approximation preserving reduction to the Label-Consistent Subtree (LCST) problem. The LCST instance has quasi-polynomial size and logarithmic height. We remark that, in contrast, Zelikovsky's heigh-reduction theorem used in all prior work on DST achieves a reduction to a tree instance of the related Group Steiner Tree (GST) problem of similar height, however losing a logarithmic factor in the approximation ratio. Our second ingredient is an LP-rounding algorithm to approximately solve LCST instances, which is inspired by the framework developed by Rothvoss [Preprint, 2011]. We consider a Sherali-Adams lifting of a proper LP relaxation of LCST. Our rounding algorithm proceeds level by level from the root to the leaves, rounding and conditioning each time on a proper subset of label variables. A small enough (namely, polylogarithmic) number of Sherali-Adams lifting levels is sufficient to condition up to the leaves.

This is a joint work with Fabrizio Grandoni and Shi Li.

### Thursday, December 13th, 2018

The Moore-Penrose Pseudoinverse (MPP) is a key tool in data analysis. Eg., it can be used to solve least-squares problems. But the MPP is typically dense, and with very large data sets, we may be content with an approximation of MPP that is much sparser. When we have a use case where we need to multiply many vectors by the MPP, we may be willing to put in substantial effort to get a sparse approximation. We use linear programming and (not) semidefinite programming to define and calculate various sparse generalized inverses, and we use a local-search scheme to derive a relevant polynomial-time approximation algorithm. This is joint work with Marcia Fampa (UFRJ) and Victor Fuentes (UM).

We study the problem of graph clustering where the goal is to partition a graph into clusters, i.e. disjoint subsets of vertices, such that each cluster is well connected internally while sparsely connected to the rest of the graph. In particular, we use a natural bicriteria notion motivated by Kannan, Vempala, and Vetta [KVV00] which we refer to as expander decomposition. Expander decomposition has become one of the building blocks in the design of fast graph algorithms, most notably in the nearly linear time Laplacian solver by Spielman and Teng [ST04], and it also has wide applications in practice. For the global version, given graph $G$ and parameter $\phi$, we design $\tilde{O}{m/\phi)$ time algorithm to partition the vertices into clusters such that each cluster induces a subgraph of conductance at least $\phi$, while only a $\tilde{O}(\phi)$ fraction of the edges in the graph have endpoints across different clusters. This is the first nearly linear time algorithm when $\phi$ is at least $1/ polylog m$, which is the case in most practical settings and theoretical applications, and only relies on simple and basic flow-based techniques. Previous results either take $m^{1+o(1)}$ time (e.g. [NS17, Wul17]), or attain nearly linear time but with a weaker expansion guarantee where each output cluster is guaranteed to be contained inside some unknown expander (e.g. [ST13, ACL06]). In the local version of the problem, we are given a seed node $s$, and want to find a good cluster contatining $s$ (if there exists one) in time proportional to the size of the (unkown) cluster. While ﬂow and probability mass diffusion (or more generally, spectral methods) have a long history of competing to provide good graph decomposition, local methods are predominantly based on diffusion. We design the ﬁrst primarily ﬂow-based local method for locating low conductance cuts, and it has exhibited improved theoretical and empirical behavior over classical diﬀusion methods, e.g. PageRank.

One of the prominent open problems in combinatorics is the discrepancy of set systems where each element lies in at most t sets. The Beck-Fiala conjecture suggests that the right bound is $O(\sqrt{t})$, but for three decades the only known bound not depending on the size of set system has been $O(t)$. Arguably we currently lack techniques for breaking that barrier. In this paper we introduce discrepancy bounds based on Fourier analysis and we demonstrate our method on random set systems. Under the assumption that the number of elements n is at least $O(m^2log(n))$, where $m$ is the number of sets, we show that a uniformly random set system will have a discrepancy bound of $1$ with high probability. This talk is based on joint work with Thomas Rothvoss.

Explaining the excellent practical performance of the simplex method for linear programming has been a major topic of research for over 50 years. One of the most successful frameworks for understanding the simplex method was given by Spielman and Teng (JACM `04), who the developed the notion of smoothed analysis. Starting from an arbitrary linear program with d variables and n constraints, Spielman and Teng analyzed the expected runtime over random perturbations of the LP (smoothed LP), where variance sigma^2 Gaussian noise is added to the LP data. In particular, they gave a two-stage shadow vertex simplex algorithm which uses an expected O(d^55 n^86 sigma^{−30}) number of simplex pivots to solve the smoothed LP. Their analysis and runtime was substantially improved by Deshpande and Spielman (FOCS `05) and later Vershynin (SICOMP `09). The fastest current algorithm, due to Vershynin, solves the smoothed LP using an expected O(d^3 log^3 n sigma^{−4}) number of pivots (for small sigma), improving the dependence on n from polynomial to logarithmic.

While the original proof of Spielman and Teng has now been substantially simplified, the resulting analyses are still quite long and complex and the parameter dependencies far from optimal. In this work, we make substantial progress on this front, providing an improved and simpler analysis of the shadow simplex method, where our main algorithm requires an expected O(d^2 sqrt(log n) sigma^{-2}) number of simplex pivots. We obtain our results via an improved shadow bound, key to earlier analyses as well, combined with algorithmic techniques of Borgwardt (ZOR `82) and Vershynin. As an added bonus, our analysis is completely modular, allowing us to obtain non-trivial bounds for perturbations beyond Gaussians, such as Laplace perturbations.

This is joint work with Sophie Huiberts (CWI): https://arxiv.org/abs/1711.05667.