Playlist: 25 videos

Multi-Agent Reinforcement Learning and Bandit Learning

Remote video URL
0:32:6
Doina Precup (McGill University / DeepMind Montreal)
https://simons.berkeley.edu/talks/tbd-393
Multi-Agent Reinforcement Learning and Bandit Learning
Visit talk page
Remote video URL
0:38:25
Kaiqing Zhang (MIT)
https://simons.berkeley.edu/talks/tbd-398
Multi-Agent Reinforcement Learning and Bandit Learning
Visit talk page
Remote video URL
0:38:5
Zhuoran Yang (UC Berkeley)
https://simons.berkeley.edu/talks/sequential-information-design-markov-persuasion-process-and-its-efficient-reinforcement
Multi-Agent Reinforcement Learning and Bandit Learning

In today's economy, it becomes important for Internet platforms to consider the sequential information design problem to align its long-term interest with the incentives of the gig service providers. In this talk, I will introduce a novel model of sequential information design, namely the Markov persuasion processes (MPPs), where a sender, with informational advantage, seeks to persuade a stream of myopic receivers to take actions that maximize the sender's cumulative utilities in a finite horizon Markovian environment with varying prior and utility functions. Planning in MPPs thus faces the unique challenge i finding a signaling policy that is simultaneously persuasive to the myopic receivers and inducing the optimal long-term cumulative utilities of the sender. Under the online setting where the model is unknown, I will introduce a provably efficient reinforcement learning algorithm, the Optimism-Pessimism Principle for Persuasion Process (OP4), which features a novel combination of both optimism and pessimism principles and enjoys a sublinear regret upper bound.
Visit talk page
Remote video URL
0:57:50
Kalesha Bullard (DeepMind)
https://simons.berkeley.edu/talks/multi-agent-reinforcement-learning-towards-zero-shot-communication
Multi-Agent Reinforcement Learning and Bandit Learning
Visit talk page
Remote video URL
0:51:35
Gabriele Farina (Carnegie Mellon University)
https://simons.berkeley.edu/talks/geometry-correlated-strategies-multiplayer-extensive-form-games-positive-and-parametric
Multi-Agent Reinforcement Learning and Bandit Learning

While extensive-form games (EFGs) can be converted into normal-form games (NFGs), doing so comes at the cost of an exponential blowup of the strategy space. So, progress on NFGs and EFGs has historically followed separate tracks, with the EFG community often having to catch up with advances (e.g., last-iterate convergence and predictive regret bounds) from the larger NFG community. In this talk I will show that the Optimistic Multiplicative Weights Update (OMWU) algorithm -- the premier learning algorithm for NFGs -- can be simulated on the normal-form equivalent of an EFG in linear time per iteration in the game tree size using a kernel trick. The resulting algorithm, Kernelized OMWU (KOMWU), applies more broadly to all convex games whose strategy space is a polytope with 0/1 integral vertices, as long as the kernel can be evaluated efficiently. In the particular case of EFGs, KOMWU closes several standing gaps between NFG and EFG learning, by enabling direct, black-box transfer to EFGs of desirable properties of learning dynamics that were so far known to be achievable only in NFGs. Specifically, KOMWU gives the first algorithm that guarantees at the same time last-iterate convergence, lower dependence on the size of the game tree than all prior algorithms, and \tilde{O}(1) regret when followed by all players.

Based on joint work with Christian Kroer, Chung-Wei Lee, and Haipeng Luo. ArXiv Link: https://arxiv.org/abs/2202.00237
Visit talk page
Remote video URL
0:40:25
Marc Lanctot (DeepMind)
https://simons.berkeley.edu/talks/general-game-theoretic-multiagent-reinforcement-learning
Multi-Agent Reinforcement Learning and Bandit Learning

Regret minimizing agents in self-play have been used to learn approximate minimax-optimal strategies with much success, scaling to large hold’em poker games and to super-human level performance in very large multiplayer games. This prescriptive approach has guided the development of algorithms for two-player zero-sum games, and similarly for fully-cooperative games. What about the fully general case– what could a prescriptive agenda look like there? Is there an agent-centric criterion that can be optimized without relying on outside authorities or third parties? In this talk, I will quickly survey the recent approaches to game-theoretic multiagent reinforcement learning in general games, and then focus on ideas that could attempt to answer these open questions in multiagent reinforcement learning.
Visit talk page
Remote video URL
0:34:11
Simon Du (University of Washington)
https://simons.berkeley.edu/talks/when-offline-two-player-zero-sum-markov-game-solvable
Multi-Agent Reinforcement Learning and Bandit Learning

We study what dataset assumption permits solving offline two-player zero-sum Markov game. In stark contrast to the offline single-agent Markov decision process, we show that the single strategy concentration assumption is insufficient for learning the Nash equilibrium (NE) strategy in offline two-player zero-sum Markov games. On the other hand, we propose a new assumption named unilateral concentration and design a pessimism-type algorithm that is provably efficient under this assumption. In addition, we show that the unilateral concentration assumption is necessary for learning an NE strategy. Furthermore, our algorithm can achieve minimax sample complexity without any modification for two widely studied settings: dataset with uniform concentration assumption and turn-based Markov game. Our work serves as an important initial step towards understanding offline multi-agent reinforcement learning.
Visit talk page
Remote video URL
0:44:36
Sylvain Sorin (Sorbonne Universite)
https://simons.berkeley.edu/talks/variants-and-invariants-no-regret-algorithms
Multi-Agent Reinforcement Learning and Bandit Learning

We describe some properties of no-regret algorithms in discrete and continuous time
Visit talk page
Remote video URL
0:36:25
Alex Peysakhovich (Meta AI)
https://simons.berkeley.edu/talks/what-does-machine-learning-offer-game-theory-and-vice-versa
Multi-Agent Reinforcement Learning and Bandit Learning

One of the major reasons that deep learning for supervised learning has been so successful is that deep networks allow us to tame high dimensional input spaces and transform them into simpler, lower dimensional objects that we can work with. This talk argues that a similar approach using ML techniques to simplify scenarios to their "low dimensional game theoretic representation" can yield progress in building agents that cooperate and coordinate, as well as allow us to apply mechanism design ideas at large scale.
Visit talk page
Remote video URL
0:40:34
Mark Sellke (Stanford)
https://simons.berkeley.edu/talks/multi-player-bandits-no-collisions
Multi-Agent Reinforcement Learning and Bandit Learning

In the stochastic multi-player bandit problem, m(greater than)1 players cooperate to maximize their total reward on a set of K(greater than)m bandit arms. However the players cannot communicate online and are penalized (e.g. receive no reward) if they collide by pulling the same arm at the same time. This problem was introduced in the context of wireless radio communication and serves as a natural model for decentralized online decision-making. There have been many results for different versions of this model. Most rely on a small number of collisions in order to implicitly communicate. However it was later realized that nearly-optimal T^{1/2} regret is possible with no collisions (and hence no communication) at all. I will discuss this construction, as well as our recent characterization of the Pareto optimal trade-offs for gap dependent regret without communication. Based on collaborations with Sebastien Bubeck, Thomas Budzinski, and Allen Liu.
Visit talk page