Tselil Schramm (Stanford University); Zhuoran Yang (Princeton University)
Zoom link will be sent out to program participants.
Speaker: Tselil Schramm (Stanford University)
Title: Lower Bounds Against Low-Degree Estimators
Abstract: I'll discuss a recent result (joint with Alex Wein) in which we give lower bounds against low-degree polynomial estimation algorithms for problems (such as submatrix recovery and planted dense subgraph) for which testing is known to be easy but estimation is conjectured to be hard. I will also emphasize some open problems which appear to be outside of the reach of our techniques.
Speaker: Zhuoran Yang (Princeton University)
Title: Provably Efficient Exploration in Policy Optimization
Abstract: While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In particular, it remains elusive how to design a provably efficient policy optimization algorithm that incorporates exploration. To bridge such a gap, this paper proposes an Optimistic variant of the Proximal Policy Optimization algorithm (OPPO), which follows an ``optimistic version'' of the policy gradient direction. This paper proves that, in the problem of episodic Markov decision process with linear function approximation, unknown transition, and adversarial reward with full-information feedback, OPPO achieves a sublinear regret. To the best of our knowledge, OPPO is the first provably efficient policy optimization algorithm that explores.
Based on joint work with Qi Cai, Chi Jin, and Zhaoran Wang.