Learning to Reason with LLMs
Large language models (LLMs) have demonstrated remarkable capabilities in generating coherent text and completing various natural language tasks. Nevertheless, their ability to perform complex, general reasoning has remained limited. In this recent talk from the workshop on Transformers as a Computational Model, Noam Brown (OpenAI) describes OpenAI’s new o1 model, an LLM trained via reinforcement learning to generate a hidden chain of thought before its response. Brown and collaborators have found that the performance of o1 consistently improves with more reinforcement learning compute and more inference compute. OpenAI’s o1 surpasses previous state-of-the-art models on a variety of benchmarks that require reasoning, including mathematics competitions, programming contests, and advanced science question sets. Brown discusses the implications of scaling this paradigm even further.