Abstract

Let (X,d) be an n-point metric space. We assume that (X,d) is given in the distance oracle model, i.e., X={1,...,n} and for every pair of points x,y from X we can query their distance d(x,y) in constant time. A k-nearest neighbor (k-NN) graph} for (X,d) is a directed graph G=(V,E) that has an edge to each of v's k nearest neighbors. We use cost(G) to denote the sum of edge weights of G.

In this paper, we study the problem of approximating cost(G) in sublinear time, when we are given oracle access to the metric space (X,d) that defines G. Our goal is to develop an algorithm that solves this problem faster than the time required to compute G. To this end, we develop an algorithm that in time ~O(min (n k^{3/2} / eps^6, n^2 / (eps^2 k))) computes an estimate K for the cost of the minimum spanning tree that satisfies with probability at least 2/3
 
  |cost(G) -  K | <= eps (cost(G) + mst(X))
 
  where mst(X) denotes the cost of the minimum spanning tree of (X,d).

Joint work with Artur Czumaj. Work was done as part of the speaker's affiliation with Google Switzerland.