Many optimization problems that arise in science and engineering are those in which we only have a stochastic approximation to the underlying problem at hand (e.g. linear regression or other problems where our objective function is a sample average). Such problems highlight some of the challenges we face at the interface of computer science and statistics: should we use a highly accurate algorithm (with costly time and space requirements yet returns numerically precise estimates) or a crude stochastic approximation scheme like stochastic gradient descent (which is light on memory and simple to implement, yet has a poor convergence rate)? The tension arises due to the statistical issue of how much accuracy we actually desire of our algorithm. In the absence of computational constraints, the minimizer on a sample average of observed data — the empirical risk minimizer (ERM) — is widely regarded as the estimation strategy of choice due to its desirable statistical convergence properties.
This talk will survey results on both these fronts (stochastic approximation algorithms and recent work on linearly convergent algorithms). Furthermore, this work will provide a simple to implement procedure (applicable to many estimation problems including linear regression) that, under mild regularity conditions, enjoys the following properties:
- The algorithm can be implemented in linear time with just a single pass of the observed data, using space linear in the size of a single sample.
- The algorithm achieves the same statistical rate of convergence as the empirical risk minimizer, on every problem (even considering constant factors).
- The algorithm's performance depends on the initial error at a rate that decreases super-polynomially.
Furthermore, we quantify this rate of convergence, showing that after the size of our dataset exceeds some (problem dependent) threshold, the streaming algorithm rapidly approaches the rate of ERM in expectation.