Abstract
Using a game theoretic analysis, we show how to derive an efficient algorithm for making "multi-calibrated" contextual predictions in an online adversarial environment. What this means is that we can be given an arbitrary collection G of subsets of the feature space (representing e.g. demographic groups), and we can promise that our predictions are calibrated not just overall, but also within each group of interest. Using this as a sub-routine, this allows us to make predictions that "Calibeat" an arbitrary collection of predictors f in the sense of Hart and Foster. This means that not only are our predictions calibrated, but they have squared error that is less than that of the predictors that we are competing against --- in fact, strictly less, to an extent controlled by how far the predictors f are from being calibrated themselves. Our construction has only a logarithmic dependence on the number of predictors we wish to Calibeat, giving an exponential improvement in this dependence over the original construction of Hart and Foster. Our techniques also easily allow us to "multicalibeat" the comparator predictors: i.e. to calibeat them not just overall, but also within each of the (arbitrarily intersecting) groups G.