Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

Abstract

A number of important applications including hyperparameter optimization, robust reinforcement learning, pure exploration and adversarial learning have as a central part of their mathematical abstraction a minmax/zero-sum game. Owing to the computationally intensive nature of such problems, it is of interest to obtain provable guarantees for first-order optimization methods. Toward this goal, this talk presents new results on the local stability and convergence of gradient descent-ascent. The focus is on the role that a timescale separation between the learning rates of the players has on the dynamics. Existing work analyzing gradient descent-ascent has primarily considered players to have a shared learning rate or for the ratio of learning rates between the players to approach infinity. When the players share a learning rate, it is known that the dynamics are not guaranteed to converge to a game-theoretically meaningful equilibrium in general. In contrast, as the ratio of learning rates approaches infinity, it is known that gradient descent-ascent converges to strict local minmax equilibria. The work covered in this talk bridges the gap between the extremes by showing there exists a finite timescale separation parameter such that gradient descent-ascent converges to strict local minmax equilibria for any learning rate ratio exceeding a critical value. An explicit construction of the critical value of learning rate ratios will be shown as will the resulting rates of convergence to strict local minimax equilibrium. After presenting key results, there will be a discussion of open questions and the impact on online learning not only in continuous games, but in other ML applications of import. Throughout, the talk will include illustrative numerical examples that demonstrate how the inclusion of a timescale separation can significantly impact convergence. Time permitting, applications to improving the training of generative adversarial networks on image datasets will also be presented.

Gradient Descent-Ascent Provably Converges to Strict Local Minmax Equilibria with a Finite Timescale Separation

Abstract

Video Recording