Skip to main content
Search
Utility navigation
Calendar
Contact
Login
MAKE A GIFT
Main navigation
Programs & Events
Research Programs
Workshops & Symposia
Public Lectures
Research Pods
Internal Program Activities
Algorithms, Society, and the Law
Participate
Apply to Participate
Propose a Program
Postdoctoral Research Fellowships
Law and Society Fellowships
Science Communicator in Residence Program
Circles
Breakthroughs Workshops and Goldwasser Exploratory Workshops
People
Scientific Leadership
Staff
Current Long-Term Visitors
Research Fellows
Postdoctoral Researchers
Scientific Advisory Board
Governance Board
Industry Advisory Council
Affiliated Faculty
Science Communicators in Residence
Law and Society Fellows
News & Videos
News
Videos
Support for the Institute
Annual Fund
All Funders
Institutional Partnerships
For Visitors
Visitor Guide
Plan Your Visit
Location & Directions
Accessibility
Building Access
IT Guide
About
Image
Safety-Guaranteed LLMs
Program
Special Year on Large Language Models and Transformers, Part 2
Location
Calvin Lab auditorium
Date
Monday, Apr. 14
–
Friday, Apr. 18, 2025
Back to calendar
Breadcrumb
Home
Workshop & Symposia
Schedule | Safety-Guaranteed LLMs
Secondary tabs
The Workshop
Schedule
Videos
Monday, Apr. 14, 2025
9
–
9:15 a.m.
Coffee and Check-In
9:15
–
9:30 a.m.
Welcome Address
9:30
–
10:30 a.m.
Simulating Counterfactual Training
Roger Grosse (University of Toronto)
Video
10:30
–
11 a.m.
Break
11 a.m.
–
12 p.m.
AI Safety Via Inference-Time Compute
Boaz Barak (Harvard University)
12
–
2 p.m.
Lunch (on your own)
2
–
3 p.m.
Controlling Untrusted AIs With Monitors
Ethan Perez (Anthropic)
Video
3
–
3:30 p.m.
Break
3:30
–
4:30 p.m.
Game Theoretic Approaches to AI Safety
Georgios Piliouras (Google DeepMind + SUTD)
4:30
–
5:30 p.m.
Reception
Tuesday, Apr. 15, 2025
9:30
–
10 a.m.
Coffee and Check-In
10
–
11 a.m.
Full-Stack Alignment
Ryan Lowe (Meaning Alignment Institute)
11
–
11:30 a.m.
Break
11:30 a.m.
–
12:30 p.m.
Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?
Geoffrey Irving (UK AI Safety Institute)
Video
12:30
–
2:30 p.m.
Lunch (on your own)
2:30
–
3:30 p.m.
Amortised Inference Meets Llms: Algorithms And Implications For Faithful Knowledge Extraction
Nikolay Malkin (University of Edinburgh)
Video
3:30
–
4 p.m.
Break
4
–
5 p.m.
Superintelligent Agents Pose Catastrophic Risks — Can Scientist AI Offer a Safer Path? | Richard M. Karp Distinguished Lecture
Yoshua Bengio (IVADO - Mila - Université de Montréal)
5
–
6 p.m.
Panel Discussion
Yoshua Bengio (IVADO - Mila - Université de Montréal)
,
Dawn Song (UC Berkeley)
,
Roger Grosse
,
Geoffrey Irving
,
Siva Reddy (IVADO - Mila - McGill University)
Video
Wednesday, Apr. 16, 2025
8:30
–
9 a.m.
Coffee and Check-In
9
–
10 a.m.
Robustness of jailbreaking across aligned LLMs, reasoning models and agents
Siva Reddy (IVADO - Mila - McGill University)
Video
10
–
10:15 a.m.
Break
10:15
–
11:15 a.m.
Adversarial Robustness of LLMs' Safety Alignment
Gauthier Gidel (IVADO - Mila - Université de Montréal)
Video
11:15
–
11:30 a.m.
Break
11:30 a.m.
–
12:30 p.m.
Antidistillation Sampling
Zico Kolter (Carnegie Mellon University)
Video
12:30
–
2 p.m.
Lunch (on your own)
2
–
3 p.m.
Causal Representation Learning: A Natural Fit for Mechanistic Interpretability
Dhanya Sridhar (IVADO + Université de Montréal + Mila)
Video
3
–
3:15 p.m.
Break
3:15
–
4:15 p.m.
Out Of Distribution, Out Of Control? Understanding Safety Challenges In AI
Aditi Raghunathan (Carnegie Mellon University)
Video
Thursday, Apr. 17, 2025
9
–
9:30 a.m.
Coffee and Check-In
9:30
–
10:30 a.m.
LLM Negotiations And Social Dilemmas
Aaron Courville (IVADO + Université de Montréal + Mila)
Video
10:30
–
11 a.m.
Break
11 a.m.
–
12 p.m.
Scalably Understanding AI With AI
Jacob Steinhardt (UC Berkeley)
Video
12
–
1:45 p.m.
Lunch (on your own)
1:45
–
2:45 p.m.
Future Directions In AI Safety Research
Dawn Song (UC Berkeley)
Video
2:45
–
3 p.m.
Break
3
–
4 p.m.
What Can Theory Of Cryptography Tell Us About AI Safety
Shafi Goldwasser (Simons Institute, UC Berkeley)
Video
4
–
5 p.m.
Assessing The Risk Of Advanced Reinforcement Learning Agents Causing Human Extinction
Michael Cohen (UC Berkeley)
Video
Friday, Apr. 18, 2025
8:30
–
9 a.m.
Coffee and Check-In
9
–
10 a.m.
Safeguarded AI Workflows
David Dalrymple (MIT)
Video
10
–
10:15 a.m.
Break
10:15
–
11:15 a.m.
AI Safety: LLMs, Facts, Lies, And Agents In The Real World
Christopher Pal (IVADO + Polytechnique Montréal + Université de Montréal + Mila)
Video
11:15
–
11:30 a.m.
Break
11:30 a.m.
–
12:30 p.m.
Measurements For Capabilities And Hazards
Dan Hendrycks (Center for AI Safety)
Video
12:30
–
2 p.m.
Lunch (on your own)
2
–
3 p.m.
Theoretical And Empirical Aspects Of Singular Learning Theory For AI Alignment
Daniel Murfet (Timaeus)
Video
3
–
3:30 p.m.
Break
3:30
–
4:30 p.m.
Probabilistic Safety Guarantees Using Model Internals
Jacob Hilton (Alignment Research Center)
Video
4:30
–
4:45 p.m.
Closing Remarks
Share this page
Copy URL of this page
link to homepage
Close
Main navigation
Programs & Events
Research Programs
Workshops & Symposia
Public Lectures
Research Pods
Internal Program Activities
Algorithms, Society, and the Law
Participate
Apply to Participate
Propose a Program
Postdoctoral Research Fellowships
Law and Society Fellowships
Science Communicator in Residence Program
Circles
Breakthroughs Workshops and Goldwasser Exploratory Workshops
People
Scientific Leadership
Staff
Current Long-Term Visitors
Research Fellows
Postdoctoral Researchers
Scientific Advisory Board
Governance Board
Industry Advisory Council
Affiliated Faculty
Science Communicators in Residence
Law and Society Fellows
News & Videos
News
Videos
Support for the Institute
Annual Fund
All Funders
Institutional Partnerships
For Visitors
Visitor Guide
Plan Your Visit
Location & Directions
Accessibility
Building Access
IT Guide
About
Utility navigation
Calendar
Contact
Login
MAKE A GIFT
link to homepage
Close
Search