
Abstract
Formal task specifications, such as those expressible in temporal logic or as automata, have gained interest as an alternative means of specifying objectives for deep reinforcement learning (RL) agents. Unlike traditional Markovian rewards, formal specifications are composable, easily transferred across environments, and offer a precise notion of satisfaction. Despite these appeals, formal specifications have limited adoption across RL, in large part due to the difficulty of expressing formal specifications as readily-optimizable objectives. In this talk, I will discuss two directions where formal specifications can provide a distinct benefit over traditional reward functions in training RL agents. First, I will demonstrate how the precision of specifications allows us to disambiguate between correctness and optimality conditions, which can help ensure that learned policies achieve desired behavior. I will then discuss how the compositionality of specifications can enable more efficient learning in cooperative multi-agent settings. Throughout the talk, I will address how we can mitigate common pitfalls of specification-guided RL, such as optimization difficulty and reward sparsity. I will close with a brief discussion on when we should consider formal specifications in general, when we should avoid them, and the role alternative reward objectives can play in RL.