Abstract
We study spectral algorithms for the high-dimensional Nearest Neighbor Search problem (NNS). In the practice of NNS,spectral approaches are quire popular and often outperform the random-projection methods that otherwise seem theoretically optimal (on worst case datasets). This study aims to address the above disparity in our understanding.
In particular, we consider a semi-random setting where a dataset P in R^d is chosen arbitrarily from an unknown subspace of low dimension k (much less than d), and then perturbed by fully d-dimensional Gaussian noise. We design spectral NNS algorithms whose query time depends polynomially on d and log|P| for large ranges of k and d. Our algorithms use repeated computation of the top PCA vector/subspace, and are effective even when the random-noise magnitude is much larger than the interpoint distances in P.
Joint work with Amirali Abdullah, Ravi Kannan, Robi Krauthgamer.