![Large Language Models and Transformers: Part 1 (FALL)](/sites/default/files/styles/workshop_banner_sm_1x/public/2024-07/LLM_Fall.jpg?h=557a15ea&itok=A41iD4he)
Abstract
The quadratic time required to compute attention layers in transformers is a major bottleneck for long context lengths. I will survey recent approximation algorithms based on dimensionality reduction, which under certain assumptions, achieve linear time. I will focus on HyperAttention and PolySketchFormer, discussing their theory and practice, and also mention recent followup work.