Local Flow-Based Methods for Graph Clustering

Abstract

We study the problem of graph clustering where the goal is to partition a graph into clusters, i.e. disjoint subsets of vertices, such that each cluster is well connected internally while sparsely connected to the rest of the graph. In particular, we use a natural bicriteria notion motivated by Kannan, Vempala, and Vetta [KVV00] which we refer to as expander decomposition. Expander decomposition has become one of the building blocks in the design of fast graph algorithms, most notably in the nearly linear time Laplacian solver by Spielman and Teng [ST04], and it also has wide applications in practice. For the global version, given graph $G$ and parameter $\phi$, we design $\tilde{O}{m/\phi)$ time algorithm to partition the vertices into clusters such that each cluster induces a subgraph of conductance at least $\phi$, while only a $\tilde{O}(\phi)$ fraction of the edges in the graph have endpoints across different clusters. This is the first nearly linear time algorithm when $\phi$ is at least $1/ polylog m$, which is the case in most practical settings and theoretical applications, and only relies on simple and basic flow-based techniques. Previous results either take $m^{1+o(1)}$ time (e.g. [NS17, Wul17]), or attain nearly linear time but with a weaker expansion guarantee where each output cluster is guaranteed to be contained inside some unknown expander (e.g. [ST13, ACL06]). In the local version of the problem, we are given a seed node $s$, and want to find a good cluster contatining $s$ (if there exists one) in time proportional to the size of the (unkown) cluster. While ﬂow and probability mass diffusion (or more generally, spectral methods) have a long history of competing to provide good graph decomposition, local methods are predominantly based on diffusion. We design the ﬁrst primarily ﬂow-based local method for locating low conductance cuts, and it has exhibited improved theoretical and empirical behavior over classical diﬀusion methods, e.g. PageRank.