Organizers: Mengdi Wang (Princeton University; chair), Emma Brunskill (Stanford University), Sean Meyn (University of Florida)
Many of the algorithms and theoretical tools for reinforcement learning assume on-policy data; that is, one can choose a policy and obtain data generated by that policy (either by running the policy or through simulation). In many applications, however, obtaining on-policy data is impossible and all one has is a batch set of data that maybe generated by a nonstationary and even unknown policy. Estimating the value of new policies becomes a hard statistical problem. This workshop attempts to gather some of the tools needed to satisfactorily find good policies with off-policy data, drawing from the statistics and operations research literature, among others. In particular, it will emphasize statistical complexity, confidence bounds and safety guarantees. It will also include recent research on policy certification and robust, reliable policy search. Finally, it will make connections with the system identification and robust control literature from the controls community.
All events take place in the Calvin Lab auditorium.
Further details about this workshop will be posted in due course. Enquiries may be sent to the organizers workshop-rl3 [at] lists [dot] simons [dot] berkeley [dot] edu (at this address).