Abstract

Sampling is a powerful technique, which is at the core of statistical data analysis and machine learning. Using a finite, often small, set of observations, we attempt to estimate properties of an entire sample space. How good are estimates obtained from a sample? Any rigorous application of sampling requires an understanding of the sample complexity of the problem – the minimum size sample needed to obtain the required results. In this tutorial we will cover some of the rich mathematical theory that has been develop in recent years to address this question, in particular in the context of statistical machine learning and rigorous data mining.

Main topics: Uniform convergence; VC-dimension - the ϵ-net and ϵ-sample theorems; Rademacher complexity; Applications in machine learning and data mining.

The second session of this mini course will take place on Wednesday, August 24th, 2016 10:15 am – 11:00 am.

Video Recording