Abstract

Reinforcement learning (RL) has recently achieved tremendous successes in several artificial intelligence applications. Many of the forefront applications of RL involve "multiple agents", e.g., playing chess and Go games, autonomous driving, and robotics. In this talk, I will introduce several recent works on multi-agent reinforcement learning (MARL) with theoretical guarantees. Specifically, we focus on solving the most basic multi-agent RL setting: infinite-horizon zero-sum stochastic games (Shapley 1953), using three common RL approaches: model-based, value-based, and policy-based ones. We first show that for the tabular setting, "model-based multi-agent RL" (estimating the model first and then planning) can achieve near-optimal sample complexity when a generative model of the game environment is available. Second, we show that a simple variant of "Q-learning" (value-based) can find the Nash equilibrium of the game, even if the agents run it independently/in a "fully decentralized" fashion. Third, we show that "policy gradient" methods (policy-based) can solve zero-sum stochastic games with linear dynamics and quadratic costs, which equivalently solves a robust and risk-sensitive control problem. With this connection to robust control, we discover that our policy gradient methods automatically preserve the robustness of the system during iterations, some phenomena we referred to as "implicit regularization". Time permitting, I will also discuss some ongoing and future directions along these lines.

Video Recording