Organizers: Shipra Agrawal (Columbia University; chair), Sébastien Bubeck (MSR), Alan Malek (MIT)
Reinforcement learning is but one way to model interactions with a dynamic environment. Indeed, online algorithms have a long history in the theoretical computer science community, and many of the concepts (such as regret minimization and competitive ratios) have produced very successful algorithms and design principles (such as exponential weights and choosing actions optimistically). At a high level, this workshop aims to involve the theoretical computer science community by asking which classical online learning tools can be successfully applied to reinforcement learning problems and particularly the problem of exploration. Of particular interest are the tools for designing and analyzing algorithms that are robust to non-stochastic and adversarial data. The bandit problem is well understood in both stochastic and non-stochastic cases, but what is the right approach to "robustify'' exploration in reinforcement learning? One can generalize bandit algorithms to reinforcement learning by considering exploration over a parametric class of Markov decision processes; however, this class is a generalization of linear bandits which is not fully solved, as finite-time, structure-dependent minimax algorithms are unknown. Finally, the "elephant in the room" is to develop flexible methods that scale for a large class of environments and are not sensitive to the so-called realizability assumption.
All events take place in the Calvin Lab auditorium.
Further details about this workshop will be posted in due course. Enquiries may be sent to the organizers workshop-rl2 [at] lists.simons.berkeley.edu (at this address).