Abstract

What could language models learn about causality and experimentation from their passive training? Observational learning is inherently limited. However, in this talk I will draw an important distinction between observational and passive learning, and argue that LMs learn passively, but from interventional data. I will then show empirically that agents trained via passive imitation on expert interventional data can learn generalizable causal strategies that they can apply at test time to discover causal structures never seen in training. This is possible even in a complex environment with high-dimensional observations, with the support of natural language explanations. Furthermore, explanations can even allow passive learners to generalize out-of-distribution from perfectly-confounded training data. Finally, I'll show that language models, trained only on passive next-word prediction, can generalize causal intervention strategies from a few-shot prompt containing examples of experimentation, together with explanations and reasoning. These results highlight the surprising power of passive learning of active causal strategies, and may help to understand the behaviors and capabilities of language models. I will close by reflecting on some of the open questions in how to enable AI to use explanations in a more human-like way.