Optimizing Average Reward MDPs with a Generative Model

Workshop

Theory of Reinforcement Learning Reunion

Speaker(s)

Aaron Sidford (Stanford University)

Location

Calvin Lab Auditorium

Date

Monday, Nov. 15, 2021

Time

9:30 – 10 a.m. PT

Abstract

Markov Decision Processes (MDPs) are a fundamental mathematical model to reason about uncertainty and play a key role in reinforcement learning theory. In this talk, I will discuss recent advances in optimizing MDPs given by a generative model. Though near-optimal sample complexities are known for approximately optimizing MDPs with discounted reward functions, obtaining a similar characterization of the average-reward functions has been elusive. In this talk, I will discuss how to improve the sample complexity for this problem using convex optimization tools and how to obtain near-optimal sample complexities in certain regimes by reduction to solving discounted MDPs. This talk will feature joint work with Yujia Jin (arxiv:2008.12776 and arXiv:2106.07046).

Attachment

Slides