Many recent Markov chain Monte Carlo (MCMC) samplers leverage continuous dynamics to define a transition kernel that rapidly explores a target distribution. In tandem, a focus has been on devising scalable variants that use stochastic gradients in the dynamic simulations. However, such stochastic gradient MCMC (SG-MCMC) samplers have lagged behind their full-data counterparts in terms of the complexity of dynamics considered since proving convergence in the presence of the stochastic gradient noise is nontrivial.
In this talk, we first present a general recipe for constructing SG-MCMC samplers that translates the task of finding a valid sampler into one of choosing two matrices. Importantly, any continuous Markov process sampling from the target distribution can be written in our framework. We then turn our attention to MCMC techniques based on jump processes, and propose an algorithm akin to Metropolis Hastings---with the same ease of implementation---but allowing for irreversible dynamics. Finally, we describe how SG-MCMC algorithms can be applied to applications involving dependent data, where the challenge arises from the need to break the dependencies when considering minibatches of observations. We propose an algorithm that harnesses the inherent memory decay of the process and provably leads to the correct stationary distribution.
We demonstrate our methods on a number of large-scale applications, including a streaming Wikipedia LDA analysis and segmentation of a lengthy genome sequence and ion channel recording.