Abstract

Multiagent reinforcement learning has received a growing interest with various problem settings and applications. We will first present our recent work in learning decentralized policies in networked multiagent systems under a cooperative setting. Specifically,  we propose a Scalable Actor Critic (SAC) framework that exploits the network structure and finds a local, decentralized policy that is an O(ρ^κ)-approximation of a first-order stationary point of the global objective for some ρ∈(0,1). Motivated by the question of characterizing the performance of the stationary points, we look into the case where states could be shared among agents but agents still need to take actions following decentralized policies. We show that even when agents have identical interests, the first-order stationary points are only corresponding to Nash equilibria. This observation naturally leads to the use stochastic game framework to characterize the performance of policy gradients for decentralized policies for multiagent MDP systems.

Joint work with Guannan Qu, Adam Wierman, Runyu Zhang, Zhaolin Ren

Video Recording