Abstract

We use randomized numerical linear algebra techniques to develop a new fast algorithm to estimate the leverage scores of an autoregressive model in big data regimes. We show that the accuracy of approximations lies within $(1+\bigO{\varepsilon})$ of the true leverage scores with high probability. These theoretical results are exploited to develop an efficient leverage score sampling algorithm to fit an appropriate autoregressive model to big time series data and find the maximum likelihood estimates of its parameters. Empirical results on large-scale synthetic as well as real data highly support the theoretical results and reveal the efficacy of this new approach.