Fellows Talk - Tselil Schramm & Zhuoran Yang

Parent Program

Probability, Geometry, and Computation in High Dimensions, Theory of Reinforcement Learning

Location

Zoom link will be sent out to program participants.

Speaker(s)

Tselil Schramm (Stanford University); Zhuoran Yang (Princeton University)

Date

Thursday, Oct. 15, 2020

Time

1 – 2 p.m. PT

Back to calendar

Description

Speaker: Tselil Schramm (Stanford University)

Title: Lower Bounds Against Low-Degree Estimators
Abstract: I'll discuss a recent result (joint with Alex Wein) in which we give lower bounds against low-degree polynomial estimation algorithms for problems (such as submatrix recovery and planted dense subgraph) for which testing is known to be easy but estimation is conjectured to be hard. I will also emphasize some open problems which appear to be outside of the reach of our techniques.

Speaker: Zhuoran Yang (Princeton University)

Title: Provably Efficient Exploration in Policy Optimization
Abstract: While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an ``optimistic version'' of the policy gradient direction. This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves a sublinear regret. To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.

Fellows Talk - Tselil Schramm & Zhuoran Yang

All scheduled dates:

Upcoming

Past