Playlist: 16 videos

Structure of Constraints in Sequential Decision-Making

Remote video URL
0:49:46
Negin Golrezei (Massachusetts Institute of Technology)
https://simons.berkeley.edu/talks/optimal-learning-structured-bandits
Structure of Constraints in Sequential Decision-Making

In this work, we study structured multi-armed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain structural information regarding the reward distributions and would like to minimize their regret by exploiting this information, where the regret is its performance difference against a benchmark policy that knows the best action ahead of time. In the absence of structural information, the classical upper confidence bound (UCB) and Thomson sampling algorithms are well known to suffer only minimal regret. As recently pointed out, neither algorithms are, however, capable of exploiting structural information that is commonly available in practice. We propose a novel learning algorithm that we call DUSA whose worst-case regret matches the information-theoretic regret lower bound up to a constant factor and can handle a wide range of structural information. Our algorithm DUSA solves a dual counterpart of the regret lower bound at the empirical reward distribution and follows its suggested play. Our proposed algorithm is the first computationally viable learning policy for structured bandit problems that has asymptotic minimal regret.
Visit talk page
Remote video URL
1:2:16
Rad Niazadeh (The University of Chicago Booth School of Business)
https://simons.berkeley.edu/talks/when-matching-meets-batching-optimal-multi-stage-algorithms-and-applications
Structure of Constraints in Sequential Decision-Making

In several applications of real-time matching of demand to supply in online marketplaces --- for example matching delivery requests to dispatching centers in Amazon or allocating video-ads to users in YouTube --- the platform allows for some latency (or there is an inherent allowance for latency) in order to batch the demand and improve the efficiency of the resulting matching. Motivated by these scenarios, I investigate the optimal trade-off between batching and inefficiency in the context of designing robust online allocations in this talk. In particular, I consider K-stage variants of the classic vertex weighted bipartite b-matching and AdWords problems in the adversarial setting, where online vertices arrive stage-wise and in K batches—in contrast to online arrival. Our main result for both problems is an optimal (1-(1-1/K)^K)-competitive competitive (fractional) matching algorithm, improving the classic (1 − 1/e) competitive ratio bound known for the online variants of these problems (Mehta et al., 2007; Aggarwal et al., 2011). Our main technique at high-level is developing algorithmic tools to vary the trade-off between “greedyness” and “hedging” of the matching algorithm across stages. We rely on a particular family of convex-programming based matchings that distribute the demand in a specifically balanced way among supply in different stages, while carefully modifying the balancedness of the resulting matching across stages. More precisely, we identify a sequence of polynomials with decreasing degrees to be used as strictly concave regularizers of the maximum weight matching linear program to form these convex programs. By providing structural decomposition of the underlying graph using the optimal solutions of these convex programs and recursively connecting the regularizers together, we develop a new multi-stage primal-dual framework to analyze the competitive ratio of this algorithm. We extend our results to integral allocations in the vertex weighted b-matching problem with large budgets, and in the AdWords problem with small bid over budget ratios. I will also briefly mention a recent extension of these results to the multi-stage configuration allocation problem and its applications to video-ads. The talk is a based on the following two working papers: (i) "Batching and Optimal Multi-stage Bipartite Allocations", https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3689448 (ii) "Optimal Multi-stage Configuration Allocation with Applications to Video Advertising" https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3924687
Visit talk page
Remote video URL
0:57:30
Sanjay Shakkottai (University of Texas at Austin)
https://simons.berkeley.edu/talks/power-adaptivity-representation-learning-meta-learning-federated-learning
Structure of Constraints in Sequential Decision-Making

A central problem in machine learning is as follows: How should we train models using data generated from a collection of clients/environments, if we know that these models will be deployed in a new and unseen environment? In the setting of few-shot learning, two prominent approaches are: (a) develop a modeling framework that is “primed” to adapt, such as Model Adaptive Meta Learning (MAML), or (b) develop a common model using federated learning (such as FedAvg), and then fine tune the model for the deployment environment. We study both these approaches in the multi-task linear representation setting. We show that the reason behind generalizability of the models in new environments trained through either of these approaches is that the dynamics of training induces the models to evolve toward the common data representation among the clients’ tasks. In both cases, the structure of the bi-level update at each iteration (an inner and outer update with MAML, and a local and global update with FedAvg) holds the key — the diversity among client data distributions are exploited via inner/local updates, and induces the outer/global updates to bring the representation closer to the ground-truth. In both these settings, these are the first results that formally show representation learning, and derive exponentially fast convergence to the ground-truth representation. Based on joint work with Liam Collins, Hamed Hassani, Aryan Mokhtari, and Sewoong Oh. Papers: https://arxiv.org/abs/2202.03483 , https://arxiv.org/abs/2205.13692
Visit talk page
Remote video URL
1:3:31
Gabriele Farina (Carnegie Mellon University)
https://simons.berkeley.edu/talks/near-optimal-no-regret-learning-general-convex-games
Structure of Constraints in Sequential Decision-Making

A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's regret after T repetitions grows polylogarithmically in T, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces---such as normal-form and extensive-form games. The question as to whether O(polylog T) regret bounds can be obtained for general convex and compact strategy sets---which occur in many fundamental models in economics and multiagent systems---while retaining efficient strategy updates is an important question. In this talk, we answer this in the positive by establishing the first uncoupled learning algorithm with O(log T) per-player regret in general convex games, that is, games with concave utility functions supported on arbitrary convex and compact strategy sets. Our learning dynamics are based on an instantiation of optimistic follow-the-regularized-leader over an appropriately lifted space using a self-concordant regularizer that is, peculiarly, not a barrier for the feasible region. Further, our learning dynamics are efficiently implementable given access to a proximal oracle for the convex strategy set, leading to O(loglog T) per-iteration complexity; we also give extensions when access to only a linear optimization oracle is assumed. Finally, we adapt our dynamics to guarantee O(sqrt(T)) regret in the adversarial regime. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret bounds either in terms of the dependence on the number of iterations or on the dimension of the strategy sets. Based on joint work with Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee, Christian Kroer, and Tuomas Sandholm. Paper link: https://arxiv.org/abs/2206.08742
Visit talk page
Remote video URL
0:58:45
Adam Wierman (California Institute of Technology)
https://simons.berkeley.edu/node/22753
Structure of Constraints in Sequential Decision-Making

Making use of modern black-box AI tools such as deep reinforcement learning is potentially transformational for safety-critical systems such as data centers, the electricity grid, transportation, and beyond. However, such machine-learned algorithms typically do not have formal guarantees on their worst-case performance, stability, or safety and are typically difficult to make use of in distributed, networked settings. So, while their performance may improve upon traditional approaches in “typical” cases, they may perform arbitrarily worse in scenarios where the training examples are not representative due to, e.g., distribution shift, or in situations where global information is unavailable to local controllers. These represent significant drawbacks when considering the use of AI tools in safety-critical networked systems. Thus, a challenging open question emerges: Is it possible to provide guarantees that allow black-box AI tools to be used in safety-critical applications? In this talk, I will provide an overview of a variety of projects from my group that seek to develop robust and localizable tools combining model-free and model-based approaches to yield AI tools with formal guarantees on performance, stability, safety, and sample complexity.
Visit talk page