Safety-Guaranteed LLMs

Program

Special Year on Large Language Models and Transformers, Part 2

Location

Calvin Lab auditorium

Date

Monday, Apr. 14 – Friday, Apr. 18, 2025

Back to calendar

Workshops

Simulating Counterfactual Training

Visit talk page

Workshops

Controlling Untrusted AIs With Monitors

Visit talk page

Workshops

Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?

Visit talk page

Workshops

Amortised Inference Meets Llms: Algorithms And Implications For Faithful Knowledge Extraction

Visit talk page

Richard M. Karp Distinguished Lectures

Panel Discussion

Visit talk page

Workshops

Robustness of jailbreaking across aligned LLMs, reasoning models and agents

Visit talk page

Workshops

Adversarial Robustness of LLMs' Safety Alignment

Visit talk page

Workshops

Antidistillation Sampling

Visit talk page

Workshops

Causal Representation Learning: A Natural Fit for Mechanistic Interpretability

Visit talk page

Workshops

Out Of Distribution, Out Of Control? Understanding Safety Challenges In AI

Visit talk page