Abstract
A fundamental problem in data streams is that of finding the heavy hitters, also known as the top-k, most popular items, frequent items, elephants, or iceberg queries. There are several variants of this problem, which quantify what it means for an item to be frequent, including what are known as the l1-heavy hitters and l2-heavy hitters. There are a number of algorithmic solutions for these problems, starting with the work of Misra and Gries, as well as the CountMin and CountSketch data structures, among others. In this talk we cover several recent results which improve upon the classical solutions to these problems. In particular, with coauthors we develop new algorithms for finding l1-heavy hitters and l2-heavy hitters, with significantly less memory and processing time required than what was known, and which are optimal in a number of parameter regimes.