Safe Exploration in Reinforcement Learning

Reinforcement learning has seen stunning empirical breakthroughs. At its heart is the challenge of trading exploration -- collecting data for learning better models -- and exploitation -- using the estimate to make decisions.  In many applications, exploration is a potentially dangerous proposition, as it requires experimenting with actions that have unknown consequences.  Hence, most prior work has confined exploration to simulated environments.  In this talk, I will formalize the problem of safe exploration as one of optimizing an unknown reward function subject to unknown constraints.  Both reward and constraints are revealed through noisy experiments, and safety requires that no infeasible action is chosen at any point. Starting with the bandit setting, where actions do not affect state transitions, I will discuss increasingly rich models that capture both known and unknown dynamics. Our approach uses Bayesian inference over the objective and constraints, and -- under some regularity conditions -- is guaranteed to be both safe and complete, i.e., converge to a natural notion of reachable optimum.  I will also show experiments on safe automatic parameter tuning of robotic platforms, as well as safe exploration of unknown environments.


All scheduled dates:


No Upcoming activities yet