Abstract
Sequential decision making situations in real world applications often involve multiple long term constraints and nonlinear objectives. Some examples from online advertising include advertisers' budget constraints, nonlinear under-delivery penalties, and need for diversity in ad allocation. However, most standard models like multi-armed bandits, contextual bandits, online learning, and Markov decision processes simply optimize the sum of revenue/costs/profits received in every step, without any long-term considerations. Meanwhile online packing and covering problems allowed a limited, linear knapsack type constraints on consumption. In this talk, I will discuss a series of recent work in which we have introduced novel techniques to handle a very general form of convex constraints and objectives in sequential decision making, while providing provably optimal guarantees. Our primal-dual techniques are efficient to implement and employ fundamental concepts of Fenchel duality in convex programming, while using online learning methods as blackbox. Our techniques are remarkably general, as we demonstrate by their successful application to a variety of sequential decision making models: multi-armed bandits, contextual bandits, online packing, and online stochastic convex programming. This talk is based on the following series of work. Shipra Agrawal and Nikhil R. Devanur, Bandits with concave rewards and convex knapsacks, in EC 2014, ACM conference on Economics and Computation, June 2014. Shipra Agrawal and Nikhil R. Devanur, Fast algorithms for online stochastic convex programming, in SODA 2015 Shipra Agrawal, Nikhil R. Devanur, Lihong Li, An Efficient Algorithm for Contextual Bandits with Knapsacks, and an Extension to Concave Objectives, COLT 2016. Shipra Agrawal and Nikhil R. Devanur, Linear Contextual Bandits with Knapsacks, Unpublished manuscript 2016.