Fingerprinting is a widely-used technique for efficiently verifying that two large files are identical. More generally, sketching is a form of lossy compression based on random linear projections that, for example, could be used to estimate the Hamming distance between the files. Both techniques are particularly useful for monitoring data streams or minimizing communication in distributed and parallel computing. Many applications have been studied extensively including sparse signal recovery, locality sensitive hashing, and linear regression.
Our goal is to extend these popular techniques and design random projections that preserve combinatorial and group-theoretic structure in large data sets. In this talk, we present recent results on analyzing spectral graph properties and fingerprinting misaligned text. Both results rely on developing homomorphic sketches that support operations on the compressed data that correspond to operations on the uncompressed data. Applications include the first sublinear space algorithms for processing dynamic graph streams.
Includes joint work with Kook Jin Ahn, Alexandr Andoni, Assaf Goldberger, Sudipto Guha, and Ely Porat.