Abstract

Many popular computational primitives in Machine learning and Optimization involve summing up a pairwise function over a large data-set.  Such primitives include computing the Kernel Density, Partition Function, and Gradients of Empirical Loss functions. These procedures  require computing a discrete integral that depends on vector y (query), that is unknown a priori or might change with time. We describe a technique that utilizes ``Distance Sensitive Hashing" to provide fast approximations to such integrals that in certain cases results in sqrt{n} speedups over the previously best known data structures. Our technique extends some recent results from FOCS'17 and essentially amounts to constructing an efficient Monte Carlo-version of the Riemann Integral over the Unit Sphere  for a certain class of functions.

Video Recording