Spring 2015

IT Seminar

Mar. 25, 2015 2:30 pm4:00 pm

Mesrob Ohannessian (UC San Diego)


2nd floor interaction area

Good-Turing: The Good, The Bad, and The Ugly

The "missing mass" is the probability of all unseen symbols in i.i.d. samples from a discrete distribution. It captures a very fundamental notion of rare event. Those who attended Alon Orlitsky's talk earlier this month witnessed the glory of the Good-Turing estimator of the missing mass. In this talk, I will first dismantle this impeccable image. In particular, I will show that Good-Turing can fail to learn the missing mass in relative error, for even the simplest light-tailed distributions. I will then reconstruct a new reputation for this old estimator, as a highly effective specialized rare probability estimator for heavy-tailed distributions. This explains its success in areas where these distributions arise, such as in natural language modeling. This change in perspective opens the door to streamlined estimation techniques that are inspired by extreme value theory and that extend far beyond missing mass estimation.