Using recurrence to achieve weak to strong generalization

Workshop

Transformers as a Computational Model

Speaker(s)

Tom Goldstein (University of Maryland)

Location

Calvin Lab Auditorium

Date

Thursday, Sept. 26, 2024

Time

11:30 a.m. – 12:15 p.m. PT

Abstract

Weak-to-strong generalization refers to the ability of a reasoning model to solve "harder" problems than those in its training set. I'll argue that recurrent architectures, in which networks can dynamically scale the level of computation used to solve a problem, are necessary to achieve dramatic weak to strong behavior. I'll present examples where recurrent networks exhibit weak-to-strong generalization for a range of simple reasoning problems. Then I'll show that transformer-based LLMs benefit from recurrence as well, boosting their performance on weak-to-strong arithmetic tasks.

Using recurrence to achieve weak to strong generalization

Abstract

Video Recording