Because of the uncertainty caused by COVID-19, it is still unclear if this workshop will take place in-person or online only. Even an in-person version will have significantly reduced capacity; in-person attendance is expected to be limited to long-term program participants. In any case, the workshop will be open to the public for online participation. Please register to receive the zoom webinar access details. This page will be updated as soon as we have more information.
Many of the algorithms and theoretical tools for reinforcement learning assume on-policy data; that is, one can choose a policy and obtain data generated by that policy (either by running the policy or through simulation). In many applications, however, obtaining on-policy data is impossible and all one has is a batch set of data that maybe generated by a nonstationary and even unknown policy. Estimating the value of new policies becomes a hard statistical problem. This workshop attempts to gather some of the tools needed to satisfactorily find good policies with off-policy data, drawing from the statistics and operations research literature, among others. In particular, it will emphasize statistical complexity, confidence bounds and safety guarantees. It will also include recent research on policy certification and robust, reliable policy search. Finally, it will make connections with the system identification and robust control literature from the controls community.
Further details about this workshop will be posted in due course. Enquiries may be sent to the organizers workshop-rl3 [at] lists.simons.berkeley.edu (at this address).