Description

Efficient Distribution Estimation via Piecewise Polynomial Approximation

In this talk, we will focus on a core problem in unsupervised learning: how to infer information about a distribution based on random samples. We will describe a framework that yields new, provably efficient estimators for several natural and well-studied distribution families, including mixtures of structured distribution families (e.g., gaussian, log-concave, etc.).

The key tool that enables our approach is a new algorithm for learning probability distributions that are well approximated by piecewise polynomial density functions. Let $f$ be the density function of an arbitrary univariate distribution, and suppose that $f$ is $\OPT$ close in $L_1$ distance to an unknown piecewise polynomial function with $t$ interval pieces and degree $d$. Our algorithm draws $n = O(t(d+1)/\eps^2)$ samples from $f$, runs in time $\Otilde (n \cdot \poly (d))$ and with probability at least $9/10$ outputs an $O(t)$-piecewise degree-$d$ hypothesis $h$ that is $4 \OPT +\eps$ close to $f$.

Our general algorithm yields (near-)sample-optimal and /near-linear time/ estimators for a wide range of structured distribution families over both continuous and discrete domains in a unified way. For most of our applications, these are the /first/ sample-optimal and near-linear time estimators in the literature. Moreover, we experimentally demonstrate that our algorithm performs very well in practice.

The talk will be based on separate joint works with S. Chan, R. Servedio, X. Sun (STOC'14); and more recent work with J. Acharya, J. Li, L. Schmidt.

All scheduled dates:

Upcoming

No Upcoming activities yet

Past