Abstract
Consider the following basic problem in sparse linear regression -- an algorithm gets labeled samples of the form (x, w.x + \eps) where w is an unknown n-dimensional vector, x is drawn from a background n-dimensional distribution D, and \eps is some independent noise. Given the promise that w is k-sparse, the breakthrough work of Candes, Romberg and Tao shows that w can be recovered with a number of samples that scales as O(k log n). This should be contrasted with general linear regression where O(n) samples are information theoretically necessary. We look at this problem from the vantage point of property testing, and study the complexity of determining whether the unknown vector w is k-sparse versus "far" from k-sparse. We show that this decision problem can be solved with a number of samples which is independent of n as long as the background distribution D is i.i.d. and its components are not Gaussian. We further show that weakening any of the conditions in this result necessarily makes the complexity scale logarithmically in n.
Joint work with Xue Chen (George Mason University) and Anindya De (University of Pennsylvania).