
Abstract
In this talk, we study the problem of predicting (and optimizing) the counterfactual behavior of large-scale ML models. We start by focusing on “data counterfactuals,” where the goal is to estimate the effect of modifying a training dataset on the resulting machine learning outputs (and conversely, to design datasets that induce specific desired behavior). We introduce a method that almost perfectly estimates such counterfactuals, unlocking some new possibilities in the design and evaluation of ML models, including state-of-the-art data attribution, selection, and poisoning.