Abstract
Probabilistic models remain a hugely popular class of techniques in modern machine learning, and their expressiveness has been extended by modern large-scale compute. While exciting, these generalizations almost always come with approximations, and researchers typically ignore the fundamental influence of computational approximations. Thus, results from modern probabilistic methods become as much about the approximation method as they are about the data and the model, undermining both the Bayesian principle and the practical utility of inference in probabilistic models for real applications in science and industry.
To expose this issue and to demonstrate how to do approximate inference correctly in at least one model class, in this talk I will derive a new type of Gaussian Process approximation that provides consistent estimation of the combined posterior arising from both the finite number of data observed *and* the finite amount of computation expended. The most common GP approximations map to an instance in this class, such as methods based on the Cholesky factorization, conjugate gradients, and inducing points. I will show the consequences of ignoring computational uncertainty, and prove that implicitly modeling it improves generalization performance. I will show how to do model selection while considering computation, and I will describe an application to neurobiological data.