Abstract

We study the PCA and column subset selection problems in matrices in an online setting, where the columns arrive one after the other. In the context of column subset selection, the goal is to decide whether to include or discard a column, as it arrives. We design a simple algorithm that includes at most O(k \polylog n) columns overall and achieves a multiplicative (1+\epsilon) error compared to the best rank-k approximation of the full matrix. This result may be viewed as an analog of the classic result of Myerson on online clustering.

Video Recording