Abstract

Estimating quantiles is one of the most basic problems in data sketching. In this problem, we have a stream of n elements from some universe of size U, and are given a rank r and an error parameter ε. The goal is to give a space-efficient algorithm that outputs an element whose rank in the stream is between r-εn and r+εn. For example, one could ask to approximate the median or the 99th percentile of the stream.

Previous results include a comparison-based algorithm which uses O(1/ε log n) words of memory (Greenwald-Khanna 2003) and an algorithm that uses O(1/ε log U) words (Shrivastava-Buragohain-Agrawal-Suri 2004). More recently, it was shown that there is a randomized comparison-based algorithm using O(1/ε log log(1/δ)) words with failure probability δ. It is known that the comparison-based algorithms are best possible, but the question has remained open for general algorithms. In this talk, we will describe a deterministic quantile sketch using O(1/ε) words of memory, improving on the previous algorithms and achieving the optimal bound.

Joint work with Meghal Gupta and Hongxun Wu.