Animals execute goal-directed behaviors despite the limited range and precision of their sensors. To cope, they explore environments and store memories maintaining estimates of important information that is not presently available. Recently, breathtaking progress has been made with artificial intelligence (AI) agents that learn to perform tasks from sensory input, even at a human level, by merging reinforcement learning (RL) algorithms with deep neural networks, and the excitement surrounding these results has led to the pursuit of related ideas as explanations of non-human animal learning. However, we demonstrate that contemporary RL algorithms are unable to solve simple tasks when enough information is concealed from the sensors of the agent, a property called ``partial observability''. An obvious requirement for handling partially observed tasks is access to extensive memory, but we show memory is not enough; it is critical that the right information be stored in the right format. We develop a model, the Memory, RL, and Inference Network (MERLIN), in which memory formation is guided by a process of predictive modeling. MERLIN breaks ground to solve new classes of tasks in 3D virtual reality environments for which partial observability is severe and memories must be maintained over long durations. Our model represents an advance in AI and, we hope, a useful conceptual framework for neuroscience, demonstrating a cognitive architecture that can solve canonical behavioral tasks in psychology and neurobiology without strong simplifying assumptions about the dimensionality of sensory input or the duration of experiences.
Work done in conjunction with Chia-Chun Hung, David Amos, Mehdi Mirza, Jack Rae, Arun Ahuja, Agnieszka Grabska-Barwinska, Piotr Mirowski, Timothy Lillicrap, and others at DeepMind.