Abstract

Offline reinforcement learning algorithms make it possible to train policies using previously collected data. They introduce a number of complex new challenges that standard RL methods, which can gather additional data through exploration, do not need to contend with. But at the same time, they allow us to isolate the optimization challenges in RL independently of exploration. In this talk, I will discuss recent work from both perspectives. I will talk about our recent analysis that studies the role of implicit regularization in deep reinforcement learning, showing that, in stark contrast to supervised learning, the representations learned by standard deep RL methods do not benefit in the same way from the implicit regularization present in SGD, leading to potentially poor solutions. While this issue is present in both online and offline RL, the offline RL setting makes it far easier to study, by controlling for exploration. I will present a theoretical analysis of this implicit regularization effect and its practical implications. Then, I will discuss recent advances that improve offline RL, by leveraging representations in deep neural networks that enable broad generalization. In particular, I will show how effective policy improvement can be achieved via dynamic programming without every querying any states or actions that are not in the dataset. Lastly, I will conclude with a brief discussion of applications and potential future work that addresses the broader implications of offline RL methods.

Attachment