Abstract

In this talk, we'll dive into the exciting world of reinforcement learning and explore how policy gradient methods can help agents learn winning strategies in complex and dynamic environments. We'll take a closer look at the convergence properties of these methods in stochastic games, where the rules of the game may change over time, and show that they can converge to second-order stationary Nash equilibrium policies with high probability. We'll also discuss how the convergence rate can be improved for deterministic Nash policies, enabling agents to learn optimal strategies in a finite number of iterations. Join us as we uncover the secrets of reinforcement learning and its potential for designing efficient algorithms that can learn winning strategies in a variety of settings.