We address the fundamental problem of reliably learning about an unknown probability distribution from finite samples generated by that distribution. In particular, in this talk we analyze the sample complexity of coming up with an approximation to the unknown sample-generating distribution, in the strong sense of total variation distance, when the learner is given a family of dictributions that contains a good approximation to that target.
In particular, we prove that up to logarithmic factors O(kd^2/epsilon^2) samples are both necessary and sufficient for learning a mixture of k Gaussians in R^d. We also get tight upper bounds for mixtures of axis-aligned Gaussians.
Our main tool for proving the upper bounds is a novel notion of sample compression schemes for distributions.
This is joint work with Hassan Ashtiani, Nick Harvey, Chris Liaw, Abbas Mehrabian and Yniv Plan.