About

This is the first part in a special, yearlong program on large language models and transformers that spans the 2024–2025 academic year. This program is inspired by the success of the Simons Institute's workshop on Large Language Models and Transformers, held in August 2023.

This program's overarching goal is to try to understand the ongoing revolution in transformers and large language models (LLMs) through a wide lens, in a relaxed setting that facilitates discussion, debate, and intellectual cross-pollination. At a conceptual level, LLMs profoundly change the landscape for theories of human language, of the brain and computation, and of the nature of human intelligence. In linguistics, they provide a new way to think about grammar, semantics, and conceptual representation. In neuroscience, vector models provide a new approach to computational models of the brain. In cognitive science, they challenge our notions of what are the essential elements of human intelligence. 

This program will explore very concrete questions about transformers as models of computation. This includes algorithmic ideas to reduce the complexity of training to nearly linear in the length of the input, as well as scaling laws studying how cross-entropy loss scales with model size, data set size, and amount of compute. The program will also explore how scaling laws might help in understanding high-level outcomes such as the emergence of complex skills in LLM models. 

At a practical level, it is clear that LLMs will have a profound impact on human society, and issues of alignment, trust, and security will play a central role. Alignment refers to the gap between complex human values and the mechanisms that drive AI decision-making. Related issues include trustworthiness (How do we know the model will do what it’s intended to?), interpretability (Can we identify with certainty why a machine learning algorithm delivers a specific answer?), safety (Can we safeguard against destructive actions by ML algorithms or humans using them?), security (Can we protect data and systems from adversaries?), and fairness (Can we safeguard against bias?). The legal and regulatory dimension of technological developments in AI, as well as its practical interaction with the capabilities and design of large language models, will be another key area of inquiry.

Three workshops will take place in Fall 2024, and two workshops will take place in Spring 2025, during Part 2 of this program. 

Organizers

Long-Term Participants (including Organizers)

Zhiyuan Li (Toyota Technological Institute at Chicago (TTIC))

Research Fellows

Enric Boix (Massachusetts Institute of Technology)

Visiting Graduate Students and Postdocs

Sarah Ball (Ludwig-Maximilians-Universität München)