Monday, Nov. 30, 2020

Reductionism in Reinforcement Learning | Richard M. Karp Distinguished Lecture

In this Richard M. Karp Distinguished Lecture, Dale Schuurmans (Google Brain & University of Alberta) discusses reinforcement learning in Markov decision processes, where the challenges of value estimation, sequential planning, and exploration are jointly raised. By eliminating exploration from consideration, recent work on offline reinforcement learning has led to improved methods for value estimation and sequential planning. Taking the reduction a step further, Schuurmans reconsiders the relationship between value estimation and sequential planning and shows that a unified approach faces fundamental, unresolved difficulties whenever generalization is considered. By separating these two challenges, improved value estimation methods can be readily achieved, while the shortcomings of popular planning strategies can be made more apparent and in some cases overcome. Schuurmans illustrates in his presentation how these reductions allow other areas of machine learning, optimization, control, and planning to be better leveraged in reinforcement learning.