Abstract
This part of the tutorial will build upon the first part of the tutorial on MDPs and focus on reinforcement learning. We will present three algorithms: TD learning, Q-Learning and Natural Policy Gradient, and outline the key ideas behind obtaining finite-time performance bounds for each of these algorithms.