Description

Title: Near-Optimal Learning of Extensive-Form Games with Imperfect Information

Abstract: Imperfect Information Games such as Poker constitute an important challenge for modern artificial intelligence. In this talk we consider the problem of learning Imperfect-Information Extensive-Form Games (IIEFGs), a celebrated formulation for games involving both imperfect information and sequential play. IIEFGs is a generalization of normal-form games, and is related to (but poses quite different challenges from) Markov Games.In the first part of the talk, we will review the definition and basic properties of IIEFGs, and go through existing algorithms such as Online Mirror Descent and Counterfactual Regret Minimization. In the second part of the talk, we present our new result---A first line of algorithms that require only $\widetilde{\mathcal{O}}((XA+YB)/\varepsilon^2)$ episodes of play to find an $\varepsilon$-approximate Nash equilibrium in two-player zero-sum IIEFGs, where $X,Y$ are the number of information sets and $A,B$ are the number of actions for the two players. This improves upon the best known sample complexity of $\widetilde{\mathcal{O}}((X^2A+Y^2B)/\varepsilon^2)$ by a factor of $\widetilde{\mathcal{O}}(\max\{X, Y\})$, and matches the information-theoretic lower bound up to logarithmic factors. We achieve this sample complexity by two new algorithms: Balanced Online Mirror Descent, and Balanced Counterfactual Regret Minimization. Both algorithms rely on novel approaches of integrating \emph{balanced exploration policies} into their classical counterparts.

Bio: Yu Bai is currently a Senior Research Scientist at Salesforce AI Research. Prior to joining Salesforce, Yu completed his PhD in Statistics at Stanford University. Yu’s research interest lies broadly in machine learning, with recent focus on the theoretical foundations of reinforcement learning and games, deep learning, and uncertainty quantification.

List of related paper: https://arxiv.org/abs/2202.01752