Fall 2020

Optimizing Average Reward MDPs with a Generative Model

Monday, Nov. 15, 2021 9:30 am10:00 am PST

Add to Calendar


Aaron Sidford (Stanford University)


Calvin Lab Auditorium

Markov Decision Processes (MDPs) are a fundamental mathematical model to reason about uncertainty and play a key role in reinforcement learning theory. In this talk, I will discuss recent advances in optimizing MDPs given by a generative model. Though near-optimal sample complexities are known for approximately optimizing MDPs with discounted reward functions, obtaining a similar characterization of the average-reward functions has been elusive. In this talk, I will discuss how to improve the sample complexity for this problem using convex optimization tools and how to obtain near-optimal sample complexities in certain regimes by reduction to solving discounted MDPs. This talk will feature joint work with Yujia Jin (arxiv:2008.12776 and arXiv:2106.07046).

PDF icon Slides2.55 MB