We study spectral algorithms for high-dimensional Nearest Neighbor Searching (NNS). In practice, spectral approaches for NNS are quite popular, and often outperform the random-projection methods that are common in theoretical analyses (e.g., of worst-case datasets). Our work aims to explain this disparity.
Specifically, we consider a semi-random setting where a dataset P in R^d is chosen arbitrarily from an unknown subspace of low dimension (much smaller than d), and then perturbed by full-dimensional Gaussian noise. We design spectral NNS algorithms whose query time is polynomial in d and log |P|, and are effective even when the noise magnitude is much larger than the interpoint distances in P. These algorithms use repeated computation of the top PCA vector/subspace, akin to practical heuristics.
Joint work with Amirali Abdullah, Alexandr Andoni, and Ravindran Kannan.