Abstract

An answer to this question may require a relaxed objective. Rather than transport a probability distribution to a target, suppose we are only interested in matching moments (perhaps only approximately). An elegant collection of equations emerges whose solution appears to be ideally suited to Monte-Carlo methods. In particular, the gradient and Hessian of the objective function can be expressed in terms of second order statistics that are easily estimated using stochastic approximation. The talk will review this theory, and send out a cry for help: what is the best way to solve such problems when the objective function is very poorly conditioned, and the stochastic processes involved may have significant memory? The lecture is loosely based on Kullback-Leibler Quadratic Optimal Control, Cammardella, Busic and Meyn 2020 [arXiv:2004.01798]. The "Zap" in the title refers to Zap Q-learning, which is based on the Newton Raphson Flow (a second order method for root finding). (Can present for any length of time needed)