In general we do not expect the kinds of rules that we know how to fit, such as linear rules, to accurately model the true distribution of the data we study. Hence we desire algorithms that can find rules that fit portions of the data. We are particularly interested in cases where good predictors are only valid on a small subset of the data, less than 50%. Recent work on such tasks often cannot achieve meaningful guarantees for rich problems such as linear prediction or classification without using some extra structure. In the conditional linear regression task, we seek a k-DNF rule describing such a subset of the data distribution on which a low-error linear predictor exists, together with a description of this linear predictor function. I will discuss new algorithms for this problem, for the usual squared-error loss, that find a k-DNF that selects nearly the same fraction of the data as some optimal k-DNF. On the k-DNF we find, we can guarantee for example that there is a linear rule achieving a O~(r log log n) approximation to the loss of the optimal linear rule on an optimal r-term k-DNF on n attributes, given that either the covariance of the distribution on the individual terms does not vary too much, or given that the linear rule is O(1)-sparse.
This talk is based on joint works with Calderon, Li, Li, and Ruan; and with Hainline, Le, and Woodruff.