Abstract

In this talk we will look at Pure Exploration tasks in the Multi-Armed Bandit setting. We will review the basic Best Arm Identification problem, and present the Game Tree Search problem. We will start from lower bounds, and this will motivate the Track-and-Stop family of asymptotically instance-optimal algorithms. We will then look at structured bandit settings and problems with multiple correct answers. We will build efficient algorithms using saddle point solvers. We will finally return to the Game Tree Search problem, and discuss the connections with reinforcement learning.

Attachment

Video Recording