Abstract

We study the classic Euclidean Minimum Spanning Tree (MST) problem in the Massively Parallel Computation (MPC) model. Given a set X⊂Rd of n points, the goal is to produce a spanning tree for X with weight within a small factor of optimal. Euclidean MST is one of the most fundamental hierarchical geometric clustering algorithms, and with the proliferation of enormous high-dimensional data sets, such as massive transformer-based embeddings, there is now a critical demand for efficient distributed algorithms to cluster such data sets.
In low-dimensional space, where d=O(1), Andoni, Nikolov, Onak, and Yaroslavtsev [STOC '14] gave a constant round MPC algorithm that obtains a high accuracy (1+ϵ)-approximate solution. However, the situation is much more challenging for high-dimensional spaces: the best-known algorithm to obtain a constant approximation requires O(logn) rounds. Recently Chen, Jayaram, Levi, and Waingarten [STOC '22] gave a O~(logn) approximation algorithm in a constant number of rounds based on embeddings into tree metrics. However, to date, no known algorithm achieves both a constant number of rounds and approximation.
In this paper, we make strong progress on this front by giving a constant factor approximation in O~(loglogn) rounds of the MPC model. In contrast to tree-embedding-based approaches, which necessarily must pay Ω(logn)-distortion, our algorithm is based on a new combination of graph-based distributed MST algorithms and geometric space partitions. Additionally, although the approximate MST we return can have a large depth, we show that it can be modified to obtain a O~(loglogn)-round constant factor approximation to the Euclidean Traveling Salesman Problem (TSP) in the MPC model. Previously, only a O(logn) round was known for the problem.

This is a joint work with Rajesh Jayaram, Vahab Mirrokni and Shyam Narayanan