Clare Lyle (University of Oxford)
Generalization to new environments is critical for reinforcement learning (RL) algorithms deployed in real-world tasks, yet off-the-shelf deep RL algorithms often fail to adapt even to minor changes in their environment. In this talk we’ll focus on the problem of finding a representation that can generalize to new environments with the same causal structure as the training environments. Our key insight will be to demonstrate an equivalence between causal variable selection and the state abstraction framework for MDPs. This equivalence allows us to leverage invariant prediction methods to find state abstractions that generalize to novel observations in the multi-environment setting. This approach can yield provable guarantees with dramatic reductions in sample complexity over those obtained by empirical risk minimization. Finally, we’ll discuss a differentiable learning objective based on invariant prediction which obtains significant improvements in zero-shot transfer to new environments in the deep RL setting.