Fall 2020

A/B/n Testing with Control in the Presence of Subpopulations

Wednesday, Nov. 17, 2021 10:00 am10:30 am PST

Add to Calendar


Wouter Koolen (Centrum Wiskunde & Informatica)


Calvin Lab Auditorium

A key task in reinforcement learning is best policy identification. We study a version of this problem in structured bandits (single state MDPs), and investigate the sample complexity as a function of the action and feedback models. Concretely, we revisit the question: "which versions of my website are better than the control". We address this question in the active sequential testing framework, where we explicitly model the presence of subpopulations among the users. For three modes of interaction depending on what is observed about these subpopulations, we characterize the optimal sample complexity, develop matching algorithms and validate the methods experimentally.

PDF icon Slides385.44 KB