Reinforcement learning is typically considered an active learning paradigm: an agent interacts with the environment, collects experience, and incorporates this experience into a model, policy, or value function to improve its performance on a given task. However, utilizing such an active learning framework in real-world settings often proves to be very challenging. Many real-world domains present serious cost and safety risks to an active learning system. Furthermore, modern deep neural network models generally require large and diverse datasets for effective generalization, and collecting such datasets online for each experiment can also prove prohibitive. Reinforcement learning algorithms that utilize previously collected data offer a very appealing alternative: by leveraging prior data, RL methods can overcome the difficulties with active data collection, reuse large and diverse datasets (following a formula that has been exceedingly successful in supervised machine learning fields), and make it possible to apply RL algorithms to a much wider range of problems than has been possible previously. In this talk, I will discuss recent advances in offline reinforcement learning algorithms (also called batch reinforcement learning), which are gradually making it feasible to perform reinforcement learning from data rather than from active interaction.


Video Recording