Optimal Quantile Estimation for Streams

Abstract

Estimating quantiles is one of the most basic problems in data sketching. In this problem, we have a stream of n elements from some universe of size U, and are given a rank r and an error parameter ε. The goal is to give a space-efficient algorithm that outputs an element whose rank in the stream is between r-εn and r+εn. For example, one could ask to approximate the median or the 99th percentile of the stream.

Previous results include a comparison-based algorithm which uses O(1/ε log n) words of memory (Greenwald-Khanna 2003) and an algorithm that uses O(1/ε log U) words (Shrivastava-Buragohain-Agrawal-Suri 2004). More recently, it was shown that there is a randomized comparison-based algorithm using O(1/ε log log(1/δ)) words with failure probability δ. It is known that the comparison-based algorithms are best possible, but the question has remained open for general algorithms. In this talk, we will describe a deterministic quantile sketch using O(1/ε) words of memory, improving on the previous algorithms and achieving the optimal bound.

Joint work with Meghal Gupta and Hongxun Wu.

Attachment

Slides

Optimal Quantile Estimation for Streams

Abstract

Attachment

Video Recording