Abstract

A number of Latent Variable Models in Machine Learning (including Mixture Models, Topic Models, Stochastic block models and Mixed Membership Community Mod els) can be abstracted to the geometric problem of learn ing a latent polytope K given data points, each obtained by randomly perturbing a latent point in K. The challenge is that perturbations are typically much larger than the dimensions of K and so data points lie (far) outside K. To tackle this, we introduce the “Subset Smoothed” polytope K′ which is the convex hull of (n/k) points, each obtained by averaging a k− subset of the n data points. [k is a parameter.] We will observe that K′ ≈ K under reasonable assumptions on data. We will also observe that K′ has a polynomial time optimization oracle. These simple observations are the starting point of our provable algorithm for learning K which the talk will describe.

Joint Work with Chiranjib Bhattacharyya, Amit Kumar

Video Recording