Learning Theory of Transformers: Generalization and Optimization of In-Context Learning

Workshop

Unknown Futures of Generalization

Speaker(s)

Taiji Suzuki (University of Tokyo)

Location

Calvin Lab Auditoruim

Date

Wednesday, Dec. 4, 2024

Time

11:45 a.m. – 12:30 p.m. PT

Abstract

We introduce recent theoretical development that elucidates the learning capabilities of Transformers, focusing on in-context learning as the main subject. First, regarding statistical efficiency and approximation ability, we show that Transformers can achieve the minimax optimality for in-context learning, and show superiority against non-pretrained methods. Next, in terms of optimization theory, we demonstrate that nonlinear feature learning for in-context learning can be done with optimization guarantee. More concretely, the objective becomes strict-saddle in a mean field setting, and if the target is a single index model, then its computational efficiency can be evaluated based on the information exponent of the true function.

Attachment

Slides

Learning Theory of Transformers: Generalization and Optimization of In-Context Learning

Abstract

Attachment

Video Recording