Results 2341 - 2350 of 23900
Historically, making magic states was expected to be the dominant cost of fault tolerant quantum computation. This talk will discuss a construction, called "magic state cultivation", that (building on many previous results) seriously questions this...
Abstract not available.
Abstract not available.
There has been a lot of recent “buzz” and hype around scaling up test-time compute, and how it provides a new dimension for improving reasoning and performance of LLMs. In this talk, I will talk about my perspective on the broad question of “what it means to optimize test-time compute and how we could go about it”.
First, I will formalize the problem of optimizing test-time compute as a meta reinforcement learning (RL) problem, which provides a principled perspective on spending test-time compute from the lens of exploration and exploitation. This perspective becomes more and more relevant as we scale up test-time token budgets to be very large. It also motivates the use of cumulative regret to measure the efficacy of test-time compute by viewing it as a long output stream, consisting of several episodes from the model. Then, I will show that model allows us to devise a finetuning paradigm that specifically aims to optimize intermediate tokens in a test-time output stream with dense rewards based on how useful steps are (information gain), and show that this is crucial for enabling discovery of novel solutions on hard problems, as I will discuss is the case in practice. To contrast with our approach and to validate our analysis, I would present a simple ablation analysis on some state-of-the-art models (R1), that attempts to understand their behavior and outline some ways of resolving it with information gain.
Then, in the second part of the talk, I will turn to a more formal understanding, and show some of our recent theoretical results on proving that using RL with verification of some sort is critical for enabling effective scaling of test-time compute. We show that even if we were to train on expert search traces (e.g., via STaR or stream search), the suboptimality of expert cloning decays at a much slower rate in the amount of data and the test-time token budget than running RL as long as the base pre-trained LLM admits a heterogeneous distribution, which also puts sufficient mass on trajectories that attain somewhat important. I will show some implications of this result. Overall, this work solidifies the belief that RL is needed for optimizing for test-time compute, and coupled with the first part, presents a new way to think about this topic.
This talk is based on https://blog.ml.cmu.edu/2025/01/08/optimizing-llm-test-time-compute-inv…, an extension paper for this blog, and another paper on theoretical studies of verification vs generation.
Transformers have now been scaled to vast amounts of static data. This approach has been so successful it has forced the research community to ask, "What's next?". This workshop will bring together researchers thinking about questions related to the future...
Lawrence Roy is currently a postdoc at Aarhus University's Crypto group. He was a DOE CSGF fellow during his PhD at Oregon State University, where he was advised by Mike Rosulek. His research focuses on secure multi-party computation, as well as related...
Nicholas Meade is a PhD student at McGill University and Mila and is supervised by Siva Reddy. Broadly, his research is focused on analyzing and mitigating safety issues with Large Language Models (LLMs). He has investigated the susceptibility of LLMs to...
Arkil is a PhD student at Mila and McGill University, advised by Siva Reddy and Dzmitry Bahdanau. Previously, he has interned at the Allen Institute for AI (AllenNLP) and worked as a Research Fellow at Microsoft Research India. His research focuses on...