Transformers, parallel computation, and logarithmic depth

Workshop

Transformers as a Computational Model

Speaker(s)

Daniel Hsu (Columbia University)

Location

Calvin Lab Auditorium

Date

Monday, Sept. 23, 2024

Time

9:30 – 10:15 a.m. PT

Abstract

We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation. As a consequence, we show that logarithmic depth is sufficient for transformers to solve basic computational tasks that cannot be efficiently solved by several other neural sequence models and sub-quadratic transformer approximations. We thus establish parallelism as a key distinguishing property of transformers. This is joint work with Clayton Sanford (Google) and Matus Telgarsky (NYU).

Transformers, parallel computation, and logarithmic depth

Abstract

Video Recording