Abstract

Sparse linear regression is a fundamental problem in high-dimensional statistics, but strikingly little is known about how to solve it efficiently without restrictive conditions on the design matrix. This talk will focus on the (correlated) random design setting, where the covariates are independently drawn from a multivariate Gaussian and the ground-truth signal is sparse. Information theoretically, one can achieve strong error bounds with O(klogn) samples for arbitrary covariance matrices and sparsity k; however, no efficient algorithms are known to match these guarantees even with o(n) samples in general. As for hardness, computational lower bounds are only known with worst-case design matrices. Random-design instances are known which are hard for the Lasso, but these instances can generally be solved by Lasso after a simple change-of-basis (i.e. preconditioning). In the talk, we will discuss new upper and lower bounds clarifying the power of preconditioning in sparse linear regression. First, we show that the preconditioned Lasso can solve a large class of sparse linear regression problems nearly optimally: it succeeds whenever the dependency structure of the covariates, in the sense of the Markov property, has low treewidth -- even if the covariance matrix is highly ill-conditioned. Second, we construct (for the first time) random-design instances which are provably hard for an optimally preconditioned Lasso. Based on joint works with Jonathan Kelner, Frederic Koehler, Dhruv Rohatgi.

Video Recording