Fall 2020

Beyond Worst-Case: Instance-Dependent Optimality in Reinforcement Learning

Monday, Nov. 30, 2020 10:00 am10:30 am PST

Add to Calendar


Martin Wainwright (UC Berkeley)

It is often the case that the guarantees provided by worst-case  analysis can be overly pessimistic, lessening their utility to the  practitioner. Instance-dependent bounds allow one to formalize the  notion that "not all problems are equal"---some are easier than  others. As one foray in this general area, we focus on the problem of  policy evaluation in tabular MDPs, asking whether or not the classical  TD algorithm is optimal. We first prove local minimax lower bounds on  the problem, which reveals the local features that control the  difficulty of the problem. We then show that that standard  TD-learning, even when combined with Polyak-Ruppert averaging, is not  an instance-optimal procedure. Finally, we describe a simple variant  of TD-learning, and establish its instance-optimaltiy.   

Based on joint work with Koulik Khamaru, Ashwin Pananjady, Feng Ruan  and Michael Jordan.