Fall 2020

Reinforcement Learning from Batch Data and Simulation

Nov. 30Dec. 4, 2020

Mengdi Wang (Princeton; chair), Emma Brunskill (Stanford University), Sean Meyn (University of Florida)

Because of COVID-19, we cannot schedule in-person events on the Berkeley campus through December 2020. This workshop will take place online. It will be open to the public for online participation. Please register to receive the zoom webinar access details.

Many of the algorithms and theoretical tools for reinforcement learning assume on-policy data; that is, one can choose a policy and obtain data generated by that policy (either by running the policy or through simulation). In many applications, however, obtaining on-policy data is impossible and all one has is a batch set of data that maybe generated by a nonstationary and even unknown policy. Estimating the value of new policies becomes a hard statistical problem. This workshop attempts to gather some of the tools needed to satisfactorily find good policies with off-policy data, drawing from the statistics and operations research literature, among others. In particular, it will emphasize statistical complexity, confidence bounds and safety guarantees. It will also include recent research on policy certification and robust, reliable policy search. Finally, it will make connections with the system identification and robust control literature from the controls community.

Invited Participants: 

Alekh Agarwal (Microsoft Research Redmond), Luca Baldassarre (Swiss Re), Dimitri Bertsekas (MIT), Jalaj Bhandari (Columbia University), Vivek Shripad Borkar (Indian Institute of Technology Bombay), Sebastien Bubeck (Microsoft Research), Shantanu Prasad Burnwal (IIT Hyderabad), Marco Campi (University of Brescia), Rene Carmona (Princeton University), Lin Chen (Yale University), Brian Christian (UC Berkeley), Munther Dahleh (Massachusetts Institute of Technology), Adithya Munegowda Devraj (University of Florida), Simon Du (University of Washington), Dylan Foster (Massachusetts Institute of Technology (MIT)), Germano Gabbianelli (Universitat Pompeu Fabra), Anupam Gupta (Carnegie Mellon University), Niao He (University of Illinois at Urbana-Champaign), Rahul Jain (University of Southern California), Nan Jiang (University of Illinois Urbana-Champaign), Chi Jin (Princeton University), Mihailo Jovanovic (University of Southern California), Mikhail Konobeev (University of Alberta), Wouter Koolen (Centrum Wiskunde & Informatica), Akshay Krishnamurthy (Microsoft Research), Jason Lee (Princeton University), Sergey Levine (UC Berkeley), Lihong Li (Google Brain), Yao Liu (Stanford), Tengyu Ma (Stanford University), Shie Mannor (Technion), Aditya Modi (University of Michigan, Ann Arbor), Eric Moulines (Ecole Polytechnique), Vidya Muthukumar (UC Berkeley), Raju Nair (Swiss Re), Joseph Naor (Technion - Israel Institute of Technology), Angelia Nedich (Arizona State University), Gergely Neu (UPF), Ashwin Pananjady (UC Berkeley), Marek Petrik (University of New Hampshire), Balaraman Ravindran (IIT Madras), Wang Ruosong (), Daniel Russo (Columbia University), Barna Saha (UC Berkeley), Sergey Samsonov (National Research University Higher School of Economics), Bruno Scherrer (INRIA), Dale Schuurmans (University of Alberta), Alex Shapiro (Georgia Tech), Roshan Shariff (University of Alberta), Mohamad Kazem Shirani Faradonbeh (University of Florida), Aaron Sidford (Stanford University), Sean Sinclair (Cornell University), R. Srikant (University of Illinois at Urbana-Champaign), Phoebe Sun (Swiss Re), Csaba Szepesvári (University of Alberta, Google DeepMind), Ambuj Tewari (University of Michigan), Claire Tomlin (UC Berkeley), Mathukumalli Vidyasagar (IIT Hyderabad), Stefan Wager (Stanford Graduate School of Business), Zhaoran Wang (Northwestern University), Yu-Xiang Wang (UC Santa Barbara), Guan Wang (Swiss Re), Chen-Yu Wei (University of Southern California), Boyi Xie (Swiss Re), Lin Yang (University of California, Los Angeles), Zhuoran Yang (Princeton University), Huizhen Yu (University of Alberta), Christina Yu (Cornell University), Andrea Zanette (Stanford University)