Power of Active Sampling for Unsupervised Learning

Abstract

Most modern datasets are plagued with missing data or limited sample sizes. However, in many applications, we have control over the data sampling process such as which drug-gene interactions to record, which network routes to probe, which movies to rate, etc. Thus, we can ask the question ? what does the freedom to actively sample data in a feedback-driven manner buy us? Active learning tries to answer this question for supervised learning. In this talk, I will present work by my group on active sampling methods for several unsupervised learning problems such as matrix and tensor completion/approximation, column subset selection, learning structure of graphical models, reconstructing graph-structured signals, and clustering, as time permits. I will quantify the precise reduction in the amount of data needed to achieve a desired statistical error, as well as demonstrate that active sampling often also enables us to handle a larger class of models such as matrices with coherent row or column space, graphs with heterogeneous degree distributions, and clusters at finer resolutions, when compared to passive (non-feedback driven) sampling.

Power of Active Sampling for Unsupervised Learning

Abstract

Video Recording