Abstract

The Massively Parallel Computation (MPC) model is a well-adopted theoretical abstraction of popular frameworks for large-scale computation, such as MapReduce, Flume, Hadoop, and Spark. The rapid increase in the size and volume of data prompted many scientific questions about designing very fast MPC algorithms. In particular, what is the relation between MPC and other well-studied distributed/parallel models? How do we leverage MPC capabilities to design efficient algorithms? In this talk, we will address both questions, primarily focusing on a technique for designing MPC algorithms in the near-linear memory regime.