Recent Results on RL With Gradient Free Optimization

Abstract

This tutorial is designed to present RL theory and design techniques with emphasis on control foundations, and without any probability prerequisites. Motivation comes in part from deep-rooted control philosophy (e.g., do you think we modeled the probability of hitting a seagull when we designed a control system to go to the moon?) The models used in traditional control design are often absurdly simple, but good enough to get insight on how to control a rocket or a pancreas. Beyond this, once you understand RL techniques in this simplified framework, it does not take much work to extend the ideas to more complex probabilistic settings. The tutorial is organized as follows:

1. Control crash course: control objectives, state space philosophy and modeling, and the dynamic programming equations

2. Convex formulations of Q-learning and their relationship to deep Q-learning.

3. Extremum seeking control and gradient free approaches to RL

4. Applications to power systems. In particular, online optimization with measurement feedback (both gradient-based and gradient-free) and application to optimal power flow (presented by Andrey Berstein@NREL: former student of Nahum Shimkin, Technion)

Resources: Extensive lecture notes will be provided prior to the bootcamp (link below). In addition, students are strongly encouraged to view Prof. Richard Murray's tutorial on control fundamentals: https://simons.berkeley.edu/talks/murray-control-1

Attachment

Lecture Notes

Policy Gradient Reinforcement Learning A history of gradient free optimization from a control viewpoint...

Recent Results on RL With Gradient Free Optimization

Abstract

Attachment

Video Recording