Abstract

The adversarial (a.k.a. non-stochastic) multi-armed bandit problem is an influential marriage between the online learning literature that concerns sequential decision making without distributional assumptions and the bandit literature that concerns learning with partial information feedback. This tutorial will give an overview of the theory and algorithms on this topic, starting from classical algorithms and their analysis and then moving on to advances in recent years on data-dependent regret guarantees, structural bandits, bandit with switching costs, combining bandit algorithms, and others. Special focus will be given to highlighting the similarities and differences between online learning with full-information feedback and that with bandit feedback.

Attachment

Video Recording