Results 2331 - 2340 of 23900
Support Research on the Foundations of Computing
Scaling laws, which demonstrate a predictable relationship between AI’s performance and the amount of training data, compute, and model size continue to drive progress in AI. In this talk, we present inference compute as a new frontier for scaling LLMs. Our recent work, Large Language Monkeys, shows that coverage - the fraction of problems solved by any attempt - persistently scales with the number of samples over four orders of magnitude. Interestingly, the relationship between coverage and the number of samples is log-linear and can be modeled with an exponentiated power law, suggesting the existence of inference-time scaling laws. In domains where answers can be automatically verified, like coding and formal proofs, we show that these increases in coverage directly translate into improved performance. In domains without verifiers, we find that identifying correct samples out of many generations remains challenging. Common methods to pick correct solutions from a collection of samples, such as majority voting or reward models, plateau beyond several hundred samples and fail to fully scale with the sample budget. We then present Archon, a framework for automatically designing effective inference-time systems composed of one or more LLMs. Archon selects, combines, and stacks layers of inference-time operation such as repeated sampling, fusion, ranking, model based unit testing, and verification to construct optimized LLM systems for target benchmarks. It alleviates the need for automated verifiers by enabling strong pass@1 performance across diverse instruction following, reasoning, math, and coding tasks. Finally, we discuss some of our recent hardware acceleration techniques to improve the computational efficiency of serving LLMs.
Abstract not available.
Existing multimodal models typically have custom architectures that are designed for specific modalities (image->text, text->image, text only, etc). In this talk, I will present our recent work a series of early fusion mixed-modal models trained on arbitrary mixed sequences of images and text. I will discuss and contrast two models architectures, Chameleon and Transfusion, that make very different assumptions about how to model mixed-modal data, and argue for moving form a tokenize-everything approach to newer models that are hybrids of autoregressive transformers and diffusion. I will also cover recent efforts to better understand how to more stably train such models at scale without excessive modality competition, using a mixture of transformers technique. Together, these advances lay a possible foundation for universal models that can understand and generate data in any modality, and I will also sketch some of the steps that we still need to focus on to reach this goal.
Can we build neural architectures that go beyond Transformers by leveraging principles from dynamical systems?
In this talk, we'll introduce a novel approach to sequence modeling that draws inspiration from the paradigm of online control of dynamical systems to achieve long-range memory, fast inference, and provable robustness.