Towards Practical Algorithms for Outlier-Robust Estimation

Abstract

Modern datasets often contain outliers, defying the standard "i.i.d." assumption in the literature. These outliers can cause classical off-the-shelf algorithms to fail drastically. To address this issue, a large body of work in the last decade has developed algorithms that are both robust and run in polynomial time.

However, for these robust algorithms to be useful in large-scale machine learning pipelines, we need a fine-grained understanding that goes beyond polynomial runtime. Ideally, we want robust algorithms with resource requirements (runtime, memory, communication) similar to their non-robust counterparts.

In this talk, I will focus on the fundamental problem of sparse mean estimation, for which existing algorithms run in at least quadratic time, limiting their applicability. Our main result is the first algorithm for this task that runs in sub-quadratic time.