![Theory of Reinforcement Learning_hi-res logo](/sites/default/files/styles/workshop_banner_sm_1x/public/2023-05/Theory%20of%20Reinforcement%20Learning_hi-res.png.jpg?itok=LJoCeC_d)
Abstract
Offline reinforcement learning (RL) refers to the problem of learning policies entirely from a batch of previously collected data. In this talk, I will discuss MOPO, a model-based offline deep reinforcement learning algorithm. The gist is to modify existing model-based RL methods by introducing a penalty on the reward based on the uncertainty quantification of the learned dynamics. We maximize the penalized return, which is theoretically shown to be a lower bound for the true MDP's return. MOPO outperforms standard model-based RL methods and existing state-of-the-art model-free offline RL approaches on offline RL benchmarks.