Abstract

This talk will focus on the main algorithms for stochastic bandits, a fundamental model for sequential learning that assumes that rewards of different actions come identically and independently from fixed distributions. We will cover the main algorithms for stochastic bandits (Upper Confidence Bound and Thompson Sampling) and subsequently discuss how they can be adapted to incorporate various additional constraints.

Attachment

Video Recording