Abstract

Reinforcement learning provides a powerful and general framework for learning behaviors, which is both naturalistic and highly scalable. However, conventional applications of reinforcement learning typically require very large amounts of trial and error. This presents us with a conundrum: how can humans and animals seemingly acquire new skills and behaviors very rapidly, while RL agents require thousands or millions of trials even for simple skills? The key to answering this question lies in understanding the interplay between prior experience and acquisition of new skills. In the same way that modern self-supervised models can be fine-tuned rapidly with small amounts of labeled data, pre-trained policies could in principle be fine-tuned to new tasks via highly efficient RL methods. But developing such a pretraining and finetuning paradigm for RL involves a number of new challenges that are distinct from the more common supervised learning setting. In this talk, I'll discuss the challenges that arise when we try to pretrain and finetune RL policies and value functions, the algorithmic tools we can bring to bear on this problem, and the application domains that can be unlocked if we are successful.

Video Recording