Learning to make sequential decisions from interaction raises numerous challenges, including temporal prediction, state and value estimation, sequential planning, exploration, and strategic interaction. Each challenge is independently difficult, yet reinforcement learning research has traditionally sought holistic solutions that rely on unified learning principles. I will argue that such a holistic approach has hampered progress. To make the point, I focus on reinforcement learning in Markov Decision Processes, where the challenges of value estimation, sequential planning, and exploration are jointly raised. By eliminating exploration from consideration, recent work on off-line reinforcement learning has led to improved methods for value estimation and sequential planning. Taking the reduction a step further, I reconsider the relationship between value estimation and sequential planning, and show that a unified approach faces fundamental, unresolved difficulties whenever generalization is considered. Instead, by separating these two challenges, improved value estimation methods can be readily achieved, while the shortcomings of popular planning strategies can be made more apparent and in some cases overcome. I will attempt to illustrate how these reductions allow other areas of machine learning, optimization, control and planning to be better leveraged in reinforcement learning.
If you require accommodation for communication, please contact our Access Coordinator at simonsevents [at] berkeley.edu with as much advance notice as possible.