
Abstract
Activation functions play a pivotal role in deep neural networks, enabling them to tackle complex tasks like image recognition. However, activation functions also introduce significant challenges for deep learning theory, network dynamics analysis, and properties such as interpretability and privacy. In this talk, we revisit the necessity of activation functions, especially in cases where high-order interactions among the input elements are used, such as in the attention mechanism. Specifically, we highlight how high-order interactions are sufficient for retaining the necessary expressivity. Yet, the question remains: Is this expressivity alone sufficient for effective learning? We highlight networks that achieve strong performance both in demanding static tasks, such as ImageNet recognition, and sequence-to-sequence tasks, such as arithmetic tasks and language modeling.