Abstract
This tutorial will focus on the online learning perspective towards reinforcement learning when the model is unknown, and one incurs regret for actions selected during the learning process itself. Building on the preceding talks, as well as yesterday's tutorials on multi-arm bandits, we will focus on the challenges introduced in analysing regret under the Markovian dynamics. We will also discuss the interaction between learning and function approximation, the role of structure, and existing challenges and open problems.