Abstract
Modern supervised machine learning algorithms are at their best when provided with large datasets and large, high-capacity models. This kind of data-driven paradigm has driven remarkable progress in fields ranging from computer vision to natural language processing and speech recognition. However, reinforcement learning algorithms have proven difficult to scale to such large data regimes without the use of simulation. Online reinforcement learning algorithms require recollecting data in each experiment -- when the dataset is on the scale of ImageNet and MS-COCO, this becomes infeasible to do in the real world. Offline reinforcement learning algorithms have so far been difficult to integrate with deep neural networks. In this talk, I will discuss some recent advances in offline reinforcement learning that I think represent a step toward bridging this gap, and in addition carry a number of appealing theoretical properties. I will discuss the theoretical reasons why offline reinforcement learning is challenging, discuss the solutions that have been proposed in the literature, and describe our recent advances in developing conservative Q-learning methods that provide theoretical guarantees in the face of distributional shift, providing not only a practical way of deploying offline deep RL, but also a degree of confidence that the resulting solution will avoid common pitfalls of overestimation. I will also describe how the ideas in offline RL can be applied more broadly to data-driven optimization problems that do not necessarily involve sequential decision making, such as designing protein sequences or robot morphologies from prior experimental data. This emerging field shares many of the theoretical foundations with offline RL, but represents the possibility to broaden the impact of data-driven decision making to a wider range of applications.