Top-K Ranking: An Information-Theoretic Perspective

Abstract

Ranking is one of the fundamental problems that have proved crucial in a wide spectrum of contexts: social choice, sports competitions, web search, recommendation systems, to name a few. PageRank, the backbone of Google’s search engine, is perhaps the most popular ranking paradigm, but is facing a challenge in the big data era: the vast number of pages on the web makes the ranking problem computationally challenging; hence, PageRank often serves only as an offline algorithm.

To address the challenge, we take a disruptive approach that focuses on identifying only a few significant items, say top-K ranked items. Under a prominent measurement model in which measurements are of the pairwise information type, we characterize the fundamental limits (up to some constant factor) on the sample size required for reliably identifying the top-K ranked items. As a consequence, we demonstrate that our top-K ranking approach leads to signification reduction in sample complexity. In this talk, I will present an algorithm that runs nearly linearly in the sample size and which achieves the information theoretic limit.