Classical streaming algorithms typically do not leverage data properties or patterns in their input. We propose to augment such algorithms with a learning model that enables them to exploit data properties without being specific to a particular pattern or property. We focus on the problem of estimating the frequency of elements in a data stream, a fundamental problem with applications in network measurements, natural language processing, and security. We propose a new framework for designing frequency estimation streaming algorithms that automatically learn to leverage the properties of the input data. We present a theoretical analysis of the proposed algorithms and prove that, under natural assumptions, they have lower space complexity than prior algorithms. We also evaluate our algorithms on two problems ? monitoring Internet traffic and tracking the popularity of search queries ? and demonstrate their performance gains. Joint work with Chen-Yu Hsu, Dina Katabi and Ali Vakilian.

### Tuesday, November 27th, 2018

In sparse recovery/compressed sensing, one can estimate a k-sparse vector in n dimensions with only Theta(k log n) nonadaptive linear measurements. With adaptivity -- if each measurement can be based on the previous ones -- this reduces to O(k log log n). But what happens if the measurement matrices can only be chosen in a few rounds, as seen (for example) in constant-pass streaming algorithms? This talk will give upper and lower bounds, showing (up to a log^* k factor) that R rounds of adaptivity require Theta(k log^{1/R} n) measurements.

Streaming and sketching algorithms have found many applications in computer science and other areas. A typical sketching algorithm approximates one function. Given a class of functions F, it is natural to ask if it is possible to compute a single sketch S that will approximate every function f from F. We call S a “universal sketch” for F. In this talk we will discuss results on universal sketches for several classes of functions. For example, we will describe a sketch that approximates a sub-class of symmetric norms (a norm is symmetric if it is invariant under sign-flips and coordinate-permutations) and outline a connection between universal sketches and concentration of measure and Milman’s theorem. Also, we will describe a recent result for subset (i.e. 0-1 weighted) l0 and l1 norms. For these problems we obtain a nearly optimal upper and lower bounds for streaming space complexity. We will discuss the applicability of universal sketches for Software Defined Networks (SDN). For SDN, we will present the UnivMon (short for Universal Monitoring) framework that can simultaneously achieve both generality and high ﬁdelity across a broad spectrum of monitoring tasks.

This talk is based on joint works with Jaroslaw Blasiok, Stephen R. Chestnut, Robert Krauthgamer and Lin F. Yang (STOC 2017), with Robert Krauthgamer and Lin F. Yang (submitted) and with Zaoxing Liu, Antonis Manousis, Gregory Vorsanger, and Vyas Sekar (HotNets 2015, SIGCOMM 2016).

Let (X,d) be an n-point metric space. We assume that (X,d) is given in the distance oracle model, i.e., X={1,...,n} and for every pair of points x,y from X we can query their distance d(x,y) in constant time. A k-nearest neighbor (k-NN) graph} for (X,d) is a directed graph G=(V,E) that has an edge to each of v's k nearest neighbors. We use cost(G) to denote the sum of edge weights of G.

In this paper, we study the problem of approximating cost(G) in sublinear time, when we are given oracle access to the metric space (X,d) that defines G. Our goal is to develop an algorithm that solves this problem faster than the time required to compute G. To this end, we develop an algorithm that in time ~O(min (n k^{3/2} / eps^6, n^2 / (eps^2 k))) computes an estimate K for the cost of the minimum spanning tree that satisfies with probability at least 2/3

|cost(G) - K | <= eps (cost(G) + mst(X))

where mst(X) denotes the cost of the minimum spanning tree of (X,d).

Joint work with Artur Czumaj. Work was done as part of the speaker's affiliation with Google Switzerland.

Independent samples from an unknown probability distribution p on a domain of size k are distributed across n players, with each player holding one sample. Each player can send a message to a central referee in a simultaneous message passing (SMP) model of communication, whose goal is to solve a pre-specified inference problem. The catch, however, is that each player cannot simply send their own sample to the referee; instead, the message they send must obey some (local) information constraint. For instance, each player may be limited to communicating only L bits, where L << log k; or they may seek to reveal as little information as possible, and preserve local differentially privacy. We propose a general formulation for inference problems in this distributed setting, and instantiate it to two fundamental inference questions, learning and uniformity testing. We study the role of randomness for those questions, and obtain striking separations between public- and private-coin protocols for the latter, while showing the two settings are equally powerful for the former. (Put differently, sharing with your neighbors does help a lot for the test, but not really for the learning.) Based on joint works with Jayadev Acharya (Cornell University), Cody Freitag (Cornell University), and Himanshu Tyagi (IISc Bangalore).

We initiate the study of the role of erasures in local decoding and use our understanding to prove a separation between erasure-resilient and tolerant property testing. Local decoding in the presence of errors has been extensively studied, but has not been considered explicitly in the presence of erasures. Motivated by applications in property testing, we begin our investigation with local {\em list} decoding in the presence of erasures. We prove an analog of a famous result of Goldreich and Levin on local list decodability of the Hadamard code. Specifically, we show that the Hadamard code is locally list decodable in the presence of a constant fraction of erasures, arbitrary close to 1, with list sizes and query complexity better than in the Goldreich-Levin theorem. We use this result to exhibit a property which is testable with a number of queries independent of the length of the input in the presence of erasures, but requires a number of queries that depends on the input length, $n$, for tolerant testing. We further study {\em approximate} locally list decodable codes that work against erasures and use them to strengthen our separation by constructing a property which is testable with a constant number of queries in the presence of erasures, but requires $n^{\Omega(1)}$ queries for tolerant testing. Next, we study the general relationship between local decoding in the presence of errors and in the presence of erasures. We observe that every locally (uniquely or list) decodable code that works in the presence of errors also works in the presence of twice as many erasures (with the same parameters up to constant factors). We show that there is also an implication in the other direction for locally decodable codes (with unique decoding): specifically, that the existence of a locally decodable code that works in the presence of erasures implies the existence of a locally decodable code that works in the presence of errors and has related parameters. However, it remains open whether there is an implication in the other direction for locally {\em list} decodable codes. (Our Hadamard result shows that there has to be some difference in parameters for some settings.) We relate this question to other open questions in local decoding. Based on joint work with Noga Ron-Zewi and Nithin Varma.

No abstract available.

### Wednesday, November 28th, 2018

Locality sensitive hashing (LSH) is a popular technique for nearest neighbor search in high dimensional data sets. Recently, a new view at LSH as a biased sampling technique has been fruitful for density estimation problems in high dimensions. Given a set of points and a query point, the goal (roughly) is to estimate the density of the data set around the query. One way to formalize this is by kernel density estimation: Given a function that decays with distance and represents the "influence" of a data point at the query, sum up this influence function over the data set. Yet another way to formalize this problem is by counting the number of data points within a certain radius of the query. While these problems can easily be solved by making a linear pass over the data, this can be prohibitive for large data sets and multiple queries. Can we preprocess the data so as to answer queries efficiently? This talk will survey several recent papers that use locality sensitive hashing to design unbiased estimators for such density estimation problems and their extensions. This talk will survey joint works with Arturs Backurs, Piotr Indyk, Vishnu Natchu, Paris Syminelakis and Xian (Carrie) Wu.

We consider algorithms that take an unlabeled data set and label it in its entirety, given the ability to interact with a human expert. The goal is to minimize the amount of interaction while producing a labeling that satisfies an (epsilon, delta) guarantee: with probability at least 1-delta over the randomness in the algorithm, at most an epsilon fraction of the labels are incorrect. Scenario 1: The algorithm asks the expert for labels of specific points. This is the standard problem of active learning, except that the final product is a labeled data set rather than a classifier. Scenario 2: The expert also provides "weak rules" or helpful features. We will summarize the state of the art on these problems, in terms of promising algorithms and statistical guarantees, and identify key challenges and open problems.

We consider the problem of estimating the value of MAX-CUT in a graph in the streaming model of computation. At one extreme, there is a trivial $2$-approximation for this problem that uses only $O(\log n)$ space, namely, count the number of edges and output half of this value as the estimate for the size of the MAX-CUT. On the other extreme, for any fixed $\eps > 0$, if one allows $\tilde{O}(n)$ space, a $(1+\eps)$-approximate solution to the MAX-CUT value can be obtained by storing an $\tilde{O}(n)$-size sparsifier that essentially preserves MAX-CUT value.

Our main result is that any (randomized) single pass streaming algorithm that breaks the $2$-approximation barrier requires $\Omega(n)$-space, thus resolving the space complexity of any non-trivial approximations of the MAX-CUT value to within polylogarithmic factors in the single pass streaming model. We achieve the result by presenting a tight analysis of the Implicit Hidden Partition Problem introduced by Kapralov et al.[SODA'17] for an arbitrarily large number $k$ of players. In this problem a number of players receive random matchings of $\Omega(n)$ size together with random bits on the edges, and their task is to determine whether the bits correspond to parities of some hidden bipartition, or are just uniformly random.

Unlike all previous Fourier analytic communication lower bounds, our analysis does not directly use bounds on the $\ell_2$ norm of Fourier coefficients of a typical message at any given weight level that follow from hypercontractivity. Instead, we use the fact that graphs received by players are sparse (matchings) to obtain strong upper bounds on the $\ell_1$ norm of the Fourier coefficients of the messages of individual players using their special structure, and then argue, using the convolution theorem, that similar strong bounds on the $\ell_1$ norm are essentially preserved (up to an exponential loss in the number of players) once messages of different players are combined. We feel that our main technique is likely of independent interest.

Any graph with maximum degree Delta admits a proper vertex coloring with Delta+1 colors that can be found via a simple sequential greedy algorithm in linear time and space. But can one find such a coloring via a sublinear algorithm? In this talk, I present new algorithms that answer this question in the affirmative for several canonical classes of sublinear algorithms including graph streaming, sublinear time, and massively parallel computation (MPC) algorithms. At the core of these algorithms is a remarkably simple meta-algorithm for the (Delta+1) coloring problem: Sample O(log n) colors for each vertex uniformly at random from the Delta+1 colors and then find a proper coloring of the graph using the sampled colors; our main structural result states that the sampled set of colors with high probability contains a proper coloring of the input graph.

Efficient algorithms for k-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect k-means optimality. We discuss an a posteriori certifier of approximate optimality for k-means clustering based on Peng and Wei's semidefinite relaxation of k-means.

Efficient algorithms for k-means clustering frequently converge to suboptimal partitions, and given a partition, it is difficult to detect k-means optimality. In this paper, we develop an a posteriori certifier of approximate optimality for k-means clustering. The certifier is a sub-linear Monte Carlo algorithm based on Peng and Wei's semidefinite relaxation of k-means. In particular, solving the relaxation for small random samples of the dataset produces a high-confidence lower bound on the k-means objective, and being sub-linear, our algorithm is faster than k-means++ when the number of data points is large. If the data points are drawn independently from any mixture of two Gaussians over R^m with identity covariance, then with probability 1?O(1/m), our poly(m)-time algorithm produces a 3-approximation certificate with 99% confidence (no separation required). We also introduce a linear-time Monte Carlo algorithm that produces an O(k) additive approximation lower bound.

### Thursday, November 29th, 2018

We provide a novel ? and to the best of our knowledge, the first ? algorithm for high dimensional sparse regression with corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse parameters in the presence of a constant fraction of arbitrary corruptions. Our main contribution is a robust variant of Iterative Hard Thresholding. Using this, we provide accurate estimators with sub-linear sample complexity. Our algorithm consists of a novel randomized outlier removal technique for robust sparse mean estimation that may be of interest in its own right: it is orderwise more efficient computationally than existing algorithms, and succeeds with high probability, thus making it suitable for general use in iterative algorithms.

We study the PCA and column subset selection problems in matrices in an online setting, where the columns arrive one after the other. In the context of column subset selection, the goal is to decide whether to include or discard a column, as it arrives. We design a simple algorithm that includes at most O(k \polylog n) columns overall and achieves a multiplicative (1+\epsilon) error compared to the best rank-k approximation of the full matrix. This result may be viewed as an analog of the classic result of Myerson on online clustering.

The need for fast computation typically requires tradeoffs with statistical accuracy; here we are interested in whether computation can be significantly improved without trading-off accuracy.

In particular, for best possible accuracy in NN prediction, the number of neighbors generally needs to grow as a root of n (sample size), consequently limiting NN-search (any technique) to order of root of n complexity; in other words, expensive prediction seems unavoidable, even while using fast search methods, if accuracy is to be optimal. Unfortunately, the usual alternative is to tradeoff accuracy.

Interestingly, we show that it is possible to maintain accuracy, while reducing computation (at prediction time) to just O(log n), through simple bias and or variance correction tricks applied after data quantization or subsampling, together with (black box) fast search techniques. Furthermore, our analysis yields clear insights into how much quantization or subsampling is tolerable if optimal accuracy is to be achieved.

Our theoretical insights are validated through extensive experiments with large datasets from various domains.

The talk is based on a series of works with N. Verma, and with L. Xue.

We establish a generic reduction from nonlinear spectral gaps of metric spaces to space partitions, in the form of data-dependent Locality-Sensitive Hashing. This yields a new approach to the high-dimensional Approximate Near Neighbor Search problem (ANN). Using this reduction, we obtain a new ANN data structure under an arbitrary d-dimensional norm, where the query algorithm makes only a sublinear number of probes into the data structure. Most importantly, the new data structure achieves a O(log d) approximation for an arbitrary norm. The only other such generic approach, via John's ellipsoid, would achieve square-root-d approximation only. Joint work with Assaf Naor, Aleksandar Nikolov, Ilya Razenshteyn, and Erik Waingarten.

I will show the first approximate nearest neighbor search data structure for a general d-dimensional normed space with sub-polynomial in d approximation.

The main tool is a finite-dimensional quantitative version of a theorem of Daher, which yields a Holder homeomorphism between small perturbations of a normed space of interest and a Euclidean space. To make Daher's theorem algorithmic, we employ convex programming to compute the norm of a vector in a space, which is the result of complex interpolation between two given normed spaces.

Based on a joint work (FOCS 2018) with Alex Andoni, Assaf Naor, Sasho Nikolov and Erik Waingarten.

Theoretical work on high-dimensional nearest neighbor search has focused on the setting where a single point is sought within a known search radius, and an acceptable approximation ratio c is given. Locality Sensitive Hashing is a powerful framework for addressing this problem. In practice one usually seeks the (exact) k nearest points, the search radius is unknown, and the parameter c must be chosen in a way that depends on the data distribution. Though reductions of the latter problem to the former exist, they incur polylogarithmic overhead in time and/or space, which in turn make them unattractive in many practical settings. We address this discrepancy between theory and practice by suggesting new, simple, more efficient reductions for solving the k-Nearest Neighbor search problem using Locality Sensitive Hashing. Joint work with Tobias Christiani and Mikkel Thorup.

### Friday, November 30th, 2018

The main story of this talk is how theoretical ideas on randomized sampling algorithms became part of production code at Twitter. The specific context is finding pairs of similar items: a classic algorithmic problem that is an integral part of recommendation systems. In most incarnations, it boils down to finding high inner products among a large collection of vectors, or alternately high entries in a matrix product. Despite a rich literature on this topic (and despite Twitter's significant compute resources), none of the existing methods scaled to "industrial sized" inputs, which exceed hundreds of billions of non-zeros. I will talk about a distributed algorithm for this problem, that combines low-dimension projections (hashes) with path-sampling techniques (wedges). There is some cute math behind the algorithm, and we were able to run it in production on Twitter's recommendation system. Joint work with Aneesh Sharma (Twitter) and Ashish Goel (Stanford).

In this talk I will discuss how to recover spectral approximations to broad classes of structured matrices using only a polylogarithmic number of adaptive linear measurements to either the matrix or its inverse. Leveraging this result I will discuss how to achieve faster algorithms for solving a variety of linear algebraic problems including solving linear systems in the inverse of symmetric M-matrices (a generalization of Laplacian systems), solving linear systems that are constant spectral approximations of Laplacians (or more generally, SDD matrices), and recovering a spectral sparsifier of a graph using only a polylogarithmc number of matrix vector multiplies. More broadly this talk will show how to leverage a number of recent approaches to spectral sparsification towards expanding the robustness and scope of recent nearly linear time linear system solving research, and providing general matrix recovery machinery that may serve as a stepping stone for faster algorithms. This talk reflects joint work with Arun Jambulapati and Kiran Shiragur.

Spatial Scan Statistics measure and detect anomalous spatial behavior, specifically they identify geometric regions where significantly more of a measured characteristic is found than would be expected from the background distribution. These techniques have been used widely in geographic information science, such as to pinpoint disease outbreaks. However, until recently, available algorithms and software only scaled to at most a thousand or so spatial records. In this work I will describe how using coresets, efficient constructions, and scanning algorithms, we have developed new algorithms and software that easily scales to millions or more data points. Along the way we provide new efficient algorithms and constructions for eps-samples and eps-nets for various geometric range spaces. This is a case where subtle theoretical improvements of old structures from discrete geometry actually result in substantial empirical improvements.