Online Learning in Games

The archetypal setting of online learning can be summarized as follows: at each stage of a repeated decision process, an agent selects an action from some set X (continuous or discrete), they obtain a reward based on an a priori unknown payoff function, and the process repeats. The most widely used optimization criterion in this framework is that of regret minimization - i.e., minimizing the aggregate payoff gap between the chosen action and the best fixed action in hindsight. However, in a multi-agent environment where an agent's reward is determined by that of all other agents via a fixed mechanism (the game), finer criteria apply, chief among them being that of convergence to a Nash equilibrium. In this talk, I will survey some recent equilibrium convergence and non-convergence results, in both finite and continuous games.

Anyone who would like to give one of the weekly seminars on the RTDM program can fill in the survey at

All scheduled dates:


No Upcoming activities yet