Abstract

Given i.i.d samples from an unknown distribution, estimating its symmetric properties is a classical problem in information theory, statistics and computer science. Symmetric properties are those that are invariant to label permutations and include popular functionals such as entropy and support size. Early work on this question dates back to the 1940s when R. A. Fisher and A. S. Corbet studied this to estimate the number of distinct butterfly species in Malaysia. Over the past decade, this question has received great attention leading to computationally efficient and sample optimal estimators for various symmetric properties. All these estimators were property specific and the design of a single estimator that is sample optimal for any symmetric property remained a central open problem in the area. In a recent breakthrough, Acharya et. al. showed that computing an approximate profile maximum likelihood (PML), a distribution that maximizes the likelihood of the observed multiset of frequencies, allows statistically optimal estimation of any symmetric property. However, since its introduction by Orlitsky et. al. in 2004, efficient computation of an approximate PML remained a well known open problem. In our work, we resolved this question by designing the first efficient algorithm for computing an approximate PML distribution. In addition, our investigations have led to a deeper understanding of various computational and statistical aspects of PML and universal estimators.

Video Recording