Abstract
One of the most natural and important questions in statistical learning is: how well can a distribution be approximated from its samples? Surprisingly, this question has so far been resolved for only a few approximation measures, for example the KL-divergence, and even there, the answer is ad hoc and not well understood. We resolve the question for other important approximation measures, such as chi squared divergence and L1 distance, and if the probabilities are bounded away from zero, we resolve the question for all smooth f-divergence approximation measures, thereby providing a coherent understanding of the rate at which a distribution can be approximated from its samples.