These two lectures recap some basics of Data Science. Topics will include: high-dimensional geometry, concentration inequalities, Gaussian densities and mixtures, Singular Value Decomposition (SVD), Applications of SVD, Markov Chains, Rapid Mixing, Streaming, Randomized Algorithms for Matrices etc. For those familiar with these topics, different proofs of basic theorems (than the usual ones) may be of interest.

### Monday, August 27th, 2018

These two lectures recap some basics of Data Science. Topics will include: high-dimensional geometry, concentration inequalities, Gaussian densities and mixtures, Singular Value Decomposition (SVD), Applications of SVD, Markov Chains, Rapid Mixing, Streaming, Randomized Algorithms for Matrices etc. For those familiar with these topics, different proofs of basic theorems (than the usual ones) may be of interest.

In this tutorial, I'll give an overview of near optimal algorithms for regression, low rank approximation, and a variety of other problems. The results are based on the sketch and solve paradigm, which is a tool for quickly compressing a problem to a smaller version of itself, for which one can then run a slow algorithm on the smaller problem. These lead to the fastest known algorithms for fundamental machine learning and numerical linear algebra problems, which run in time proportional to the number of non-zero entries of the input.

In this tutorial, I'll give an overview of near optimal algorithms for regression, low rank approximation, and a variety of other problems. The results are based on the sketch and solve paradigm, which is a tool for quickly compressing a problem to a smaller version of itself, for which one can then run a slow algorithm on the smaller problem. These lead to the fastest known algorithms for fundamental machine learning and numerical linear algebra problems, which run in time proportional to the number of non-zero entries of the input.

### Tuesday, August 28th, 2018

Following on earlier lectures, I will discuss two additional ways to effectively sketch matrices for a variety of applications: sampling based on *leverage scores*, and the Subsampled Randomized Hadamard Transform.

Stochastic Gradient Descent (SGD) is the basic first-order stochastic optimization algorithm behind powerful deep learning architectures that are becoming increasingly omnipresent in society. In this lecture, we motivate the use of stochastic first-order methods and recall some convergence results for SGD. We then discuss the notion of importance sampling for SGD and how it can improve the convergence rate. Finally, we discuss methods for making SGD more "robust" to hyper-parameters of the algorithm, such as the step size, using "on the fly" adaptive step size methods such as AdaGrad, and present some theoretical results.

Sampling methods have long been ubiquitous in data science and machine learning. Recently, due to their complementary algorithmic and statistical properties, sampling and related sketching methods are central to randomized linear algebra and stochastic optimization. We'll provide an overview of structural properties central to key results in randomized linear algebra, highlighting how sampling and sketching methods can lead to improved results. This is typically achieved in quite different ways, depending on whether one is interested in worst-case linear algebra theory bounds, or whether one is using linear algebra for numerical implementations, statistical bounds, machine learning applications, uses in iterative stochastic optimization algorithms, etc. We'll provide an overview of how sampling and sketching methods are used in randomized linear algebra in each of these different cases, highlighting similarities and differences.

Sampling methods have long been ubiquitous in data science and machine learning. Recently, due to their complementary algorithmic and statistical properties, sampling and related sketching methods are central to randomized linear algebra and stochastic optimization. We'll provide an overview of structural properties central to key results in randomized linear algebra, highlighting how sampling and sketching methods can lead to improved results. This is typically achieved in quite different ways, depending on whether one is interested in worst-case linear algebra theory bounds, or whether one is using linear algebra for numerical implementations, statistical bounds, machine learning applications, uses in iterative stochastic optimization algorithms, etc. We'll provide an overview of how sampling and sketching methods are used in randomized linear algebra in each of these different cases, highlighting similarities and differences.

### Wednesday, August 29th, 2018

Contrary to the scientific computing community which has, wholeheartedly, embraced the second-order optimization algorithms, the machine learning (ML) community has long nurtured a distaste for such methods, in favour of first-order alternatives. When implemented naively, however, second-order methods are clearly not computationally competitive. This, in turn, has unfortunately lead to the conventional wisdom that these methods are not appropriate for large-scale ML applications. In this series of talks, we will provide an overview of various second-order optimization methods and their stochastic variants. We will demonstrate the theoretical properties as well as empirical performance of a variety of efficient Newton-type algorithms for both convex and non-convex problems. In the process, we will highlight the disadvantages of first-order methods and, in their light, showcase the practical advantages offered by appropriate application of second-order information.

Contrary to the scientific computing community which has, wholeheartedly, embraced the second-order optimization algorithms, the machine learning (ML) community has long nurtured a distaste for such methods, in favour of first-order alternatives. When implemented naively, however, second-order methods are clearly not computationally competitive. This, in turn, has unfortunately lead to the conventional wisdom that these methods are not appropriate for large-scale ML applications. In this series of talks, we will provide an overview of various second-order optimization methods and their stochastic variants. We will demonstrate the theoretical properties as well as empirical performance of a variety of efficient Newton-type algorithms for both convex and non-convex problems. In the process, we will highlight the disadvantages of first-order methods and, in their light, showcase the practical advantages offered by appropriate application of second-order information.

I will cover estimation, hypothesis testing, and confidence intervals from a frequentist perspective, and Bayesian statistical inference. Topics in classical asymptotics including consistency, maximum likelihood estimation, asymptotic tests and confidence intervals. No statistics background is assumed.

I will cover estimation, hypothesis testing, and confidence intervals from a frequentist perspective, and Bayesian statistical inference. Topics in classical asymptotics including consistency, maximum likelihood estimation, asymptotic tests and confidence intervals. No statistics background is assumed.

### Thursday, August 30th, 2018

In this tutorial, we'll discuss some phenomena in high dimension, including the distribution of mass (Brunn-Minkowski, Grunbaum), log-concavity (Prékopa-Leindler), extremal properties (Dvoretzky), concentration (Lévy, CLT) and isoperimetry (Poincaré, KLS).

In this tutorial, we'll discuss some phenomena in high dimension, including the distribution of mass (Brunn-Minkowski, Grunbaum), log-concavity (Prékopa-Leindler), extremal properties (Dvoretzky), concentration (Lévy, CLT) and isoperimetry (Poincaré, KLS).

Fitting a model to a collection of observations is one of the quintessential problems in statistics. The typical assumption is that the data was generated by a model of a given type (e.g., a mixture model). This is a simplifying assumption that is only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model must also be robust in the presence of corrupted data. Until recently, even for the most basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute. A recent line of work in theoretical computer science obtained the first computationally efficient robust estimators for a range of high-dimensional estimation tasks. In this tutorial talk, we will survey the algorithmic techniques underlying these estimators and the connections between them. We will illustrate these techniques for the problems of robust mean and covariance estimation. Finally, we will discuss new directions and opportunities for future work.

Fitting a model to a collection of observations is one of the quintessential problems in statistics. The typical assumption is that the data was generated by a model of a given type (e.g., a mixture model). This is a simplifying assumption that is only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model must also be robust in the presence of corrupted data. Until recently, even for the most basic problem of robustly estimating the mean of a high-dimensional dataset, all known robust estimators were hard to compute. A recent line of work in theoretical computer science obtained the first computationally efficient robust estimators for a range of high-dimensional estimation tasks. In this tutorial talk, we will survey the algorithmic techniques underlying these estimators and the connections between them. We will illustrate these techniques for the problems of robust mean and covariance estimation. Finally, we will discuss new directions and opportunities for future work.

### Friday, August 31st, 2018

The nearest neighbor search (NNS) problem is defined as follows: Given a set P of n points in some metric space (X, D), build a data structure that, given any point q, returns a point in P that is (approximately) closest to q. In this tutorial, I will survey classic and more recent NNS data structures that are designed for the "high-dimensional" regime. In the first half, I talk about the current state of affairs for NNS over the l_1 and l_2 distances (in particular, Locality-Sensitive Hashing (LSH) and its data-dependent counterpart), while in the second half, I will focus on NNS for non-Euclidean geometries (including some of the very recent developments, such as spectral partitioning for metric spaces).

The nearest neighbor search (NNS) problem is defined as follows: Given a set P of n points in some metric space (X, D), build a data structure that, given any point q, returns a point in P that is (approximately) closest to q. In this tutorial, I will survey classic and more recent NNS data structures that are designed for the "high-dimensional" regime. In the first half, I talk about the current state of affairs for NNS over the l_1 and l_2 distances (in particular, Locality-Sensitive Hashing (LSH) and its data-dependent counterpart), while in the second half, I will focus on NNS for non-Euclidean geometries (including some of the very recent developments, such as spectral partitioning for metric spaces).

As the sizes of modern datasets grow, many classical algorithms become prohibitively expensive, as the input is often too large to be stored in the memory of a single compute node. Streaming algorithms are designed to handle massive inputs using limited space: they process a stream of data items 'on the fly' while maintaining a small memory footprint at all times. In this talk I will discuss classical techniques for solving basic statistical estimation problems (e.g., moment estimation, distinct elements) on data streams, as well as recent results on graph analysis in the streaming model.

As the sizes of modern datasets grow, many classical algorithms become prohibitively expensive, as the input is often too large to be stored in the memory of a single compute node. Streaming algorithms are designed to handle massive inputs using limited space: they process a stream of data items 'on the fly' while maintaining a small memory footprint at all times. In this talk I will discuss classical techniques for solving basic statistical estimation problems (e.g., moment estimation, distinct elements) on data streams, as well as recent results on graph analysis in the streaming model.