Lihong Li (Google Brain)
Value function approximation lies at the heart of almost all reinforcement learning algorithms. Dominant approaches in the literature are based on dynamic programming, which apply various forms of the Bellman operator to iteratively update the value function estimate, hoping that it converges to the true value function. While successful in several prominent cases, these methods often do not have convergence guarantees and are hard to analyze, except in rather restricted cases.
In this talk, we will focus on a different approach that receives fast-growing interest recently. The key is to frame value function approximation as a more standard optimization problem with an easy-to-optimize objective function. By doing so, one can often develop better and provably convergent algorithms, whose theoretical properties can be more conveniently analyzed using existing techniques from statistical machine learning.