LLM Safety, Alignment, and Generalization

Workshop

Special Year on Large Language Models and Transformers, Part 1 Boot Camp

Speaker(s)

Roger Grosse (University of Toronto)

Location

Calvin Lab Auditorium

Date

Thursday, Sept. 5, 2024

Time

9:30 – 10:45 a.m. PT

Abstract

As LLM capabilities improve rapidly across a range of domains (including ones the designers didn’t intend), it becomes increasingly challenging to rule out catastrophic harms. I’ll argue for the need to make affirmative safety cases for LLMs. Once LLMs are capable of complex autonomous plans, understanding their motivational structures becomes increasingly central to safety. I’ll highlight the need for a science of LLM generalization so that we can understand how the training data affects a model’s beliefs and motivations.

LLM Safety, Alignment, and Generalization

Abstract

Video Recording