Abstract

An effective technique for solving optimization problems over massive data sets is to partition the data into smaller pieces, solve the problem on each piece, compute a representative solution from it, and finally obtain a solution inside the union of the representative solutions for all pieces. Such an algorithm can be implemented easily in 2 rounds of MapReduces or be applied in an streaming model. This technique can be captured via the concept of {\em composable core-sets}, and has been recently applied to solve diversity maximization problems as well as several clustering problems. However, for coverage and submodular maximization problems, impossibility bounds are known for this technique. In this talk, after an initial discussion about this technique and applications in diversity maximization and clustering problems, we focus on the submodular and coverage maximization problem, and show how to apply a *randomized variant of composable core-set problem*, and achieve 1/3-and 0.58-approximation for non-monotone and monotone submodualr maximization problems. Finally, I show how to improve the above results to achieve a 1-1/e-approximation for coverage function using a new sketching technique, and show how to design almost optimal streaming and distributed algorithms for coverage problems.

The talk summarizes result from a STOC 2015 (with M. ZadiMoghaddam) and a new paper (with H. Bateni and H. Esfandiari). The survey part of the talk are from three recent papers that appeared in PODS 2014, NIPS 2014, and ICML 2016.

Video Recording