Abstract

The Stochastic Multi-Armed Bandit is one of the canonical learning problems, studied extensively since the 1950s. The problem is modeled as taking an action in each timestep and observing a feedback / reward, and its goal is to play a sequence of action whose overall reward is comparable to the best action in hindsight. In this talk we will survey a collection of results regarding private learning in the stochastic MAB setting. We will introduce the canonical UCB algorithm and its differentially private version, discuss extensions to linear and contextual bandits, introduce a lower bound for any \epsilon-differentially private algorithm for this problem and finally introduce a different privacy preserving algorithm that meets this lower-bound.

This is joint work with Roshan Shariff and Touqir Sajed.

Video Recording