Abstract

We study the widely used spectral clustering algorithms, i.e. partition a graph into k clusters via (1) embedding the vertices of a graph into a low-dimensional space using the bottom eigenvectors of the Laplacian matrix, and (2) partitioning the embedded points via k-means algorithms. We show that, for a wide class of graphs, spectral clustering algorithms give a good approximation of the optimal clustering. While this approach was proposed in the early 1990s and has comprehensive applications, prior to our work similar results were known only for graphs generated from stochastic models.

We also give a nearly-linear time algorithm for partitioning well-clustered graphs, which is based on heat kernel embeddings and approximate nearest neighbor data structures.