Abstract

I will present recent work exploring how and when can confounded offline data be used to improve online reinforcement learning. We will explore conditions of partial observability and distribution shifts between the offline and online environments, and present results for contextual bandits, imitation learning and reinforcement learning.

Attachment

Video Recording