Abstract

We consider statistical estimations of a matrix product over the integers in a distributed setting, where we have two parties Alice and Bob; Alice holds a matrix A and Bob holds a matrix B, and they want to estimate statistics of AB. We focus on the well-studied l_p-norm, distinct elements (p = 0), and heavy hitter problems. The goal is to minimize both the communication cost and the number of rounds of communication.

This problem is closely related to the fundamental set-intersection join problem in databases: when p = 0 the problem corresponds to the size of the set-intersection join. When p = \infty the output is simply the pair of sets with the maximum intersection size. When p = 1 the problem corresponds to the size of the corresponding natural join. We also consider the heavy hitters problem which corresponds to finding the pairs of sets with intersection size above a certain threshold.

Joint work with David Woodruff.

Video Recording