Data Adaptive Versus Worst Case Algorithms: A Case Study for Clustering over Samples

Workshop

Algorithms and Uncertainty Reunion Workshop

Speaker(s)

Edith Cohen, Tel Aviv University and Google

Location

Date

Wednesday, Dec. 6, 2017

Time

11:30 a.m. – 12 p.m. PT

Abstract

Clustering is a fundamental technique in data analysis: the objective is to identify a set of k points (centroids) that minimize the clustering cost (a function of the distances of points to the nearest centroid). With very large data sets, we seek efficient constructions of small samples that act as surrogates to the full data. Existing methods that provide quality guarantees are either worst-case, unable to benefit from structure of real data set, or make explicit assumptions on the structure. We show here how to avoid both these pitfalls using adaptive algorithms and demonstrate experimentally large gains of our adaptive versus the worst-case constructions.