Abstract

The need for fast computation typically requires tradeoffs with statistical accuracy; here we are interested in whether computation can be significantly improved without trading-off accuracy.

In particular, for best possible accuracy in NN prediction, the number of neighbors generally needs to grow as a root of n (sample size), consequently limiting NN-search (any technique) to order of root of n complexity; in other words, expensive prediction seems unavoidable, even while using fast search methods, if accuracy is to be optimal. Unfortunately, the usual alternative is to tradeoff accuracy.

Interestingly, we show that it is possible to maintain accuracy, while reducing computation (at prediction time) to just O(log n), through simple bias and or variance correction tricks applied after data quantization or subsampling, together with (black box) fast search techniques. Furthermore, our analysis yields clear insights into how much quantization or subsampling is tolerable if optimal accuracy is to be achieved.

Our theoretical insights are validated through extensive experiments with large datasets from various domains.  

The talk is based on a series of works with N. Verma, and with L. Xue.

Video Recording