Fellows Talk

Parent Program

Satisfiability: Theory, Practice, and Beyond, Theoretical Foundations of Computer Systems

Location

Zoom

Speaker(s)

Lin Chen (UC Berkeley)

Date

Thursday, Apr. 22, 2021

Time

11 a.m. – 12 p.m. PT

Back to calendar

Description

Speaker: Lin Chen (UC Berkeley)

Title: Infinite-Horizon Offline Reinforcement Learning with Linear Function Approximation: Curse of Dimensionality and Algorithm

Abstract: In this paper, we investigate the sample complexity of policy evaluation in infinite-horizon offline reinforcement learning (also known as the off-policy evaluation problem) with linear function approximation. We identify a hard regime $d\gamma^{2}>1$ , where $d$ is the dimension of the feature vector and $\gamma$ is the discount rate. In this regime, for any $q\in[\gamma^{2},1]$ , we can construct a hard instance such that the smallest eigenvalue of its feature covariance matrix is $q/d$ and it requires $\Omega(\frac{d}{\gamma^{2}(q-\gamma^{2})\varepsilon^{2}}\exp(\Theta(d\gamma^{2})))$ samples to approximate the value function up to an additive error $\varepsilon$ . Note that the lower bound of the sample complexity is exponential in $d$ . If $q=\gamma^{2}$ , even infinite data cannot suffice. Under the low distribution shift assumption, we show that there is an algorithm that needs at most $O(\max\{ \frac{\Vert \theta^{\pi}\Vert _{2}^{4}}{\varepsilon^{4}}\log\frac{d}{\delta},\frac{1}{\varepsilon^{2}}(d+\log\frac{1}{\delta})\} )$ samples ( $\theta^{\pi}$ is the parameter of the policy in linear function approximation) and guarantees approximation to the value function up to an additive error of $\varepsilon$ with probability at least $1-\delta$ .

All scheduled dates:

Upcoming

No Upcoming activities yet

Fellows Talk

All scheduled dates:

Upcoming

Past