Transformers as a Computational Model

Program

Special Year on Large Language Models and Transformers, Part 1

Location

Calvin Lab auditorium

Date

Monday, Sept. 23 – Friday, Sept. 27, 2024

Back to calendar

All talks listed in Pacific Time. Schedule subject to change.

9 – 9:25 a.m.

Coffee and Check-In

9:25 – 9:30 a.m.

Opening Remarks

9:30 – 10:15 a.m.

Transformers, parallel computation, and logarithmic depth

Daniel Hsu (Columbia University)

Video

10:15 – 10:30 a.m.

Break

10:30 – 11:15 a.m.

Capabilities and limitations of Transformer in sequential reasoning

Bingbin Liu (Carnegie Mellon University)

Video

11:15 – 11:30 a.m.

Break

11:30 a.m. – 12:15 p.m.

The Parallelism Tradeoff: Understanding Transformer Expressivity Through Circuit Complexity

Will Merrill (New York University)

Video

12:15 – 2 p.m.

Lunch (on your own)

2 – 2:45 p.m.

Emergence and grokking in "simple" architectures

Misha Belkin (UCSD)

Video

2:45 – 3 p.m.

Break

3 – 3:45 p.m.

Iterated Models: Expressive Power, Learning, and Chain of Thought

Nati Srerbo (Toyota Technological Institute at Chicago)

Video

3:45 – 4:45 p.m.

Reception

9 – 9:30 a.m.

Coffee and Check-In

9:30 – 10:15 a.m.

The emergence of clusters in self-attention dynamics

Philippe Rigollet (MIT)

Video

10:15 – 10:30 a.m.

Break

10:30 – 11 a.m.

Computational Benefits and Limitations of Transformers and State-Space Models

Eran Malach (Kempner Institute, Harvard University)

Video

11 – 11:30 a.m.

Break

11:30 a.m. – 12:15 p.m.

Transformer Expressivity and Formal Logic

David Chiang (University of Notre Dame)

Video

12:15 – 2 p.m.

Lunch (on your own)

2 – 2:45 p.m.

Interpretability Agents

Sarah Schwettmann (MIT)

Video

2:45 – 3 p.m.

Break

3 – 3:45 p.m.

Recent Efficiency Improvements to Transformers

David Woodruff (Carnegie Mellon University)

3:45 – 4 p.m.

Break

4 – 5 p.m.

Panel Discussion: What can theory offer to the design and use of LLMs?

Jon Kleinberg, Philippe Rigollet, Nati Srebro, and Naomi Saphra

9 – 9:30 a.m.

Coffee and Check-In

9:30 – 10:15 a.m.

Language Generation in the Limit

Jon Kleinberg (Cornell University)

Video

10:15 – 10:30 a.m.

Break

10:30 – 11:15 a.m.

Do Large Language Models Perform Latent Reasoning? (Remote Talk)

Mor Geva (Tel Aviv University)

Video

11:15 – 11:30 a.m.

Break

11:30 a.m. – 12:15 p.m.

Language Acquisition in Language Models

Naomi Saphra (Kempner Institute at Harvard University)

Video

12:15 – 2 p.m.

Lunch (on your own)

2 – 2:45 p.m.

What was Revolutionized by the "Transformer Revolution?

Stella Biderman (ElutherAI)

Video

2:45 – 3 p.m.

Break

3 – 5 p.m.

Poster Session (2nd Floor Interactive Area)

9 – 9:30 a.m.

Coffee and Check-In

9:30 – 10:15 a.m.

Learning to Reason with LLMs

Noam Brown (OpenAI)

Video

10:15 – 10:30 a.m.

Break

10:30 – 11:15 a.m.

Understanding and Improving Efficient Language Models

Simran Arora (Stanford University)

Video

11:15 – 11:30 a.m.

Break

11:30 a.m. – 12:15 p.m.

Using recurrence to achieve weak to strong generalization

Tom Goldstein (University of Maryland)

Video

12:15 – 2 p.m.

Lunch (on your own)

2 – 2:45 p.m.

Towards Understanding Modern Alchemy

Ekin Akyurek (MIT)

Video

2:45 – 3 p.m.

Break

3 – 3:45 p.m.

A Retrieval-based Language Model at Scale (Remote Talk)

Sewon Min (UC Berkeley & AI2)

Video

3:45 – 4 p.m.

Break

4 – 5 p.m.

Panel Discussion: Are Transformers the end game?

Simran Arora, Stella Biderman, Jitendra Malik, Andrew Wilson

9 – 9:30 a.m.

Coffee and Check-In

9:30 – 10:15 a.m.

On the Tradeoffs of State Space Models

Albert Gu (Carnegie Mellon University)

Video

10:15 – 10:30 a.m.

Break

10:30 – 11:15 a.m.

Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of Structured Matrices

Andrew Gordon Wilson (New York University)

Video

11:15 – 11:30 a.m.

Break

11:30 a.m. – 12:15 p.m.

Exact solutions to the geometric dynamics of signal propagation through transformers predict their trainability

Surya Ganguli (Stanford University)

Video

12:15 – 2 p.m.

Lunch (on your own)

2 – 2:45 p.m.

Using Algorithms to Understand Transformers (and Using Transformers to Understand Algorithms)

Vatsal Sharan (University of Southern California)

Video

2:45 – 3 p.m.

Break

3 – 3:45 p.m.

Associative memories as a building block in Transformers

Alberto Bietti (Flatiron Institute)

Video

3:45 – 5 p.m.

Hike