Learning to Control Unknown Systems

Abstract

We consider the problem of designing `Learning to Control' algorithms for stochastic systems when the model parameters are unknown. Two models are considered: Markov Decision Processes and Linear Stochastic systems. A Thompson-sampling based regret-minimization learning framework is developed for trading off exploration v. exploitation. Sampling from a posterior distribution on unknown parameters at regular intervals provides the necessary exploration for learning. This obviates the need for expensive computation for exploration, and makes the algorithm suitable for real-time decision-making and learning. The key is designing a suitable exploration schedule. The proposed algorithms achieve O(sqrt{T}) expected regret which is order-optimal. It differs from classical adaptive control algorithms in its focus on non-asymptotic regret optimality as opposed to asymptotic stability. Numerical evaluation suggests robust performance of the algorithms.

Attachment

Learning to Control Unknown Systems

Learning to Control Unknown Systems

Abstract

Attachment

Video Recording