Abstract

The principle of optimism in the face of uncertainty underpins many theoretically successful reinforcement learning algorithms. In this talk I will discuss a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the new framework, it is possible to get the best of both worlds. I will discuss a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, the framework can also address largescale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods.

Video Recording