Abstract

Machine learning algorithms are increasingly being deployed into environments in which they must contend with other strategic agents with potentially misaligned objectives. The presence of these other agents breaks many of the underlying assumptions on which machine learning algorithms are built and can cause non-stationarity in the environment that can give rise to surprising dynamics and behaviors. In this talk, we will discuss two facets of this problem. In the first, we will describe a new algorithm for learning in zero-sum Markov games based on Q-learning and best-response-type dynamics which is independent, pay-off based, rational, and provably convergent. In particular we prove that this algorithm achieves the first known sqrt(T) sample complexity for payoff-based independent learning (up to a smoothing bias). In the second part of the talk we will discuss the power that strategic agents have over machine learning algorithms when they strategize as collectives.