Abstract

A common variant of infinite-horizon stochastic games are turn-based, in the sense that only one agent can take an action at a state at a given time. We initiate an algorithmic study of such turn-based stochastic games (TBSG) with both discounted and averaged general-sum rewards, and ask whether it is possible to compute a Nash equilibrium (NE) of a TBSG in time that is polynomial in the number of states (S) and the number of actions per state (A). On the one hand, we show that non-stationary NE are computable in poly(S,A) time and stationary NE are computable in poly(A) time. On the other hand, we show that computing a multiplayer approximate stationary NE is PPAD-complete in S. Our results imply PPAD-hardness for computing stationary coarse correlated equilibria in general-sum simultaneous stochastic games. This constitutes a non-standard separation in complexity between zero-sum and general-sum settings for infinite-horizon stochastic games. Beyond these results, we provide a number of additional characterizations of related stochastic games. For example, we show that computing a stationary NE of an infinite-horizon simultaneous stochastic game is in PPAD despite the inherent non-convexity in utilities. We also identify some special cases of general-sum TBSG for which pure stationary NE always exist and are computable in poly(S,A) time. Underlying all of our results are new structural properties for stochastic games that may be of independent interest.

Video Recording