
Abstract
Artificially intelligent agents are increasingly being integrated into human decision-making. Soon large language model (LLM) agents will be interacting with humans and among themselves with a mixture of goals and incentives. This context motivates a game-theoretic perspective. Rather than simply evaluating these agents on the reward achieved in a static environment, we need to be considering their behaviour in the context of the ecosystem of agents with which they are interacting. In this talk I will discuss my group's progress on studying RL training of agent policies in the context of general sum games, that are neither purely cooperative. In particular I'll discuss our novel approach known as Advantage Alignment, a family of algorithms derived from first principles that efficiently and intuitively guides policy learning towards more cooperative and effective policies. I'll conclude by discussing our progress in applying these methods in the context of LLMs and Agent interactions.