![Theory of Reinforcement Learning_hi-res logo](/sites/default/files/styles/workshop_banner_sm_1x/public/2023-05/Theory%20of%20Reinforcement%20Learning_hi-res.png.jpg?itok=LJoCeC_d)
Abstract
Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, due to the multi-period nature of the objective, policy gradient algorithms face non-convex optimization problems and can get stuck in suboptimal local minima even for extremely simple problems. This talk with discus structural properties – shared by several canonical control problems – that guarantee the policy gradient objective function has no suboptimal stationary points despite being non-convex. Time permitting, I’ll then zoom in on the special case of state aggregated policies and a proof showing that policy gradient converges to better policies than its relative, approximate policy iteration.