Towards sequence-to-sequence models without activation functions

Workshop

The Future of Language Models and Transformers

Speaker(s)

Grigorios Chrysos (University of Wisconsin-Madison)

Location

Calvin Lab Auditorium

Date

Friday, Apr. 4, 2025

Time

9:30 a.m. – 10:30 a.m. PT

Abstract

Activation functions play a pivotal role in deep neural networks, enabling them to tackle complex tasks like image recognition. However, activation functions also introduce significant challenges for deep learning theory, network dynamics analysis, and properties such as interpretability and privacy. In this talk, we revisit the necessity of activation functions, especially in cases where high-order interactions among the input elements are used, such as in the attention mechanism. Specifically, we highlight how high-order interactions are sufficient for retaining the necessary expressivity. Yet, the question remains: Is this expressivity alone sufficient for effective learning? We highlight networks that achieve strong performance both in demanding static tasks, such as ImageNet recognition, and sequence-to-sequence tasks, such as arithmetic tasks and language modeling.

Towards sequence-to-sequence models without activation functions

Abstract

Video Recording