Importance sampling suffices for active linear regression

Abstract

In the active linear regression problem, one wants to solve min{Φ(Ax − b) : x ∈ R^n } for some loss function Φ : R^n → R+, while simultaneously examining as few of the entries of b = (b 1 , . . . , b_m ) as possible. For least-squares regression, when Φ(y) = ∥ y∥_2 , it is known that one can sample O(n log n) rows of A proportional to their statistical leverage scores and simply solve the regression problem on the downsampled instance. Recently it was shown that sampling proportional to the l1 Lewis weights of A similarly solves the problem with O(n log n) samples when Φ(y) = ∥ y∥_1. For p ≠ 1, 2, the best-known active regression algorithms involve iterated sampling and solving sequences of residual systems. Here, we show that for the full range of p > 0, one-shot importance sampling based on the matrix A suffices for active regression, matching the known bounds, which are optimal up to poly(log n) factors. Our results hold in substantial generality, yielding the first near-optimal active regression algorithms for non-homogeneous loss functions like the Huber loss, the Tukey loss, and generalizations. Joint work with Yang Liu