A/B/n Testing with Control in the Presence of Subpopulations

Workshop

Theory of Reinforcement Learning Reunion

Speaker(s)

Wouter Koolen (Centrum Wiskunde & Informatica)

Location

Calvin Lab Auditorium

Date

Wednesday, Nov. 17, 2021

Time

10 – 10:30 a.m. PT

Abstract

A key task in reinforcement learning is best policy identification. We study a version of this problem in structured bandits (single state MDPs), and investigate the sample complexity as a function of the action and feedback models. Concretely, we revisit the question: "which versions of my website are better than the control". We address this question in the active sequential testing framework, where we explicitly model the presence of subpopulations among the users. For three modes of interaction depending on what is observed about these subpopulations, we characterize the optimal sample complexity, develop matching algorithms and validate the methods experimentally.

Attachment

Slides