![Data-Driven Decision Processes_hi-res logo](/sites/default/files/styles/workshop_banner_sm_1x/public/2023-03/Data-Driven%20Decision%20Processes_hi-res.png.jpg?itok=WR1qG1Qx)
Abstract
We consider the infinite-horizon restless bandit problem with the average reward criterion. A fundamental question is how to design computationally efficient policies that achieve a diminishing optimality gap as the number of arms, N, grows large. Existing results on asymptotical optimality all rely on the uniform global attractor property (UGAP), a complex and challenging-to-verify assumption. In this talk, we discuss how this assumption can be removed. We present various new algorithms that unveil more insights on how to achieve asymptotic optimality.