Abstract

While reinforcement learning has achieved impressive success in applications such as game-playing and robotics, there is work yet to be done to make RL truly practical for optimizing policies for real-world systems. In particular, many systems exhibit clear structure that RL algorithms currently don't know how to exploit efficiently. As a result, domain-specific heuristics often dominate RL both in system performance and resource consumption. In this talk we survey the literature as well as discuss some recent results regarding what types of structure may arise in real-world systems, and what are the existing approaches in RL theory to incorporate such structure in algorithm design. Examples of structured MDPs include models that exhibit linearity with respect to known covariates or a latent embedding, models whose dynamics can be decomposed into exogenous versus endogenous components, models that exhibit smoothness in the parameters or the trajectories, and models whose value functions can be composed of simple value functions that are weakly coupled.