Theoretical and Practical Insights from Linear Transformers

Workshop

Optimization and Algorithm Design

Speaker(s)

Xiang Cheng (Massachusetts Institute of Technology)

Location

Calvin Lab Auditorium

Date

Monday, Nov. 27, 2023

Time

3 – 3:30 p.m. PT

Abstract

Recently, the Linear Transformer model has received considerable attention as a proxy for understanding the behavior of the full-fledged Transformers. In particular, a number of papers have recently provided theoretical proof that Linear Transformers are able to learn the linear regression task in-context by implementing a gradient-based optimization in its forward pass. These results shed light on the mechanism through which Transformers can learn in-context in practice. In addition to covering these papers, I will also discuss some interesting empirical observations, which suggest that the optimization landscape of Linear Transformers itself may be provide a good approximation for understanding optimization of real Transformers.

Theoretical and Practical Insights from Linear Transformers

Abstract

Video Recording