Abstract
Most modern datasets are plagued with missing data or limited sample sizes. However, in many applications, we have control over the data sampling process such as which drug-gene interactions to record, which network routes to probe, which movies to rate, etc. Thus, we can ask the question ? what does the freedom to actively sample data in a feedback-driven manner buy us? Active learning tries to answer this question for supervised learning. In this talk, I will present work by my group on active sampling methods for several unsupervised learning problems such as matrix and tensor completion/approximation, column subset selection, learning structure of graphical models, reconstructing graph-structured signals, and clustering, as time permits. I will quantify the precise reduction in the amount of data needed to achieve a desired statistical error, as well as demonstrate that active sampling often also enables us to handle a larger class of models such as matrices with coherent row or column space, graphs with heterogeneous degree distributions, and clusters at finer resolutions, when compared to passive (non-feedback driven) sampling.