# IT Seminar

Alexandr Andoni (UC Berkeley)

116 Calvin Lab

## Algorithmic design via efficient data representations

The growing scale of data demands novel algorithmic design frameworks that are able to handle modern datasets. In this talk, I will describe how such frameworks emerge from the methods of efficient data representations. The first illustration will be the Nearest Neighbor Search (NNS) problem --- an ubiquitous massive datasets problem that is of key importance in machine learning and other areas. Its goal is to preprocess a dataset of objects (e.g., images), so that later, given a new query object, one can efficiently return the object most similar to the query. Efficient solutions may be achieved via Locality Sensitive Hashing (LSH), a data representation method that has seen a lot of success in both theory and practice.

I will present the best possible LSH-based algorithm for NNS under the Euclidean distance. Then, I will show a new method that, for the first time, provably outperforms the LSH-based algorithms.

Taking a broader perspective, I will describe other examples where the lens of "efficient data representation" leads to novel fast algorithms. These examples include a classic dynamic programming problem (edit distance), a Linear Program problem (Earth-mover distance), and an algorithmic framework for parallel models of computation (such as MapReduce).