Tuesday, Dec. 3, 2024

Speculations on Test-Time Scaling | Richard M. Karp Distinguished Lecture

The “large” in LLM is more foundational than descriptive: models improve predictably as they grow. Increases in parameters and data lead to reliable increases in accuracy. Recent results from OpenAI demonstrate that a third axis, test-time compute, also exhibits similar properties in specific domains. While the details of this method are not yet known, the result is critical for future LLM design. In his Richard M. Karp Distinguished Lecture last month, Sasha Rush (Cornell Tech) introduced the literature related to test-time compute and model self-improvement, and discussed the expected implications of test-time scaling. This talk also briefly connected these research directions to current open-source efforts to build effective reasoning models.