Skip to main content
Search
Utility navigation
Calendar
Contact
Login
MAKE A GIFT
Main navigation
Home
Programs & Events
Research Programs
Workshops & Symposia
Public Lectures
Research Pods
Internal Program Activities
Algorithms, Society, and the Law
People
Scientific Leadership
Staff
Current Long-Term Visitors
Research Fellows
Postdoctoral Researchers
Scientific Advisory Board
Governance Board
Industry Advisory Council
Affiliated Faculty
Science Communicators in Residence
Law and Society Fellows
Participate
Apply to Participate
Plan Your Visit
Location & Directions
Postdoctoral Research Fellowships
Law and Society Fellowships
Science Communicator in Residence Program
Circles
Breakthroughs Workshops and Goldwasser Exploratory Workshops
Support
Annual Fund
Funders
Industrial Partnerships
News & Videos
News
Videos
About
Image
Safety-Guaranteed LLMs
Program
Special Year on Large Language Models and Transformers, Part 2
Location
Calvin Lab auditorium
Date
Monday, Apr. 14
–
Friday, Apr. 18, 2025
Back to calendar
Breadcrumb
Home
Workshop & Symposia
Schedule
Secondary tabs
The Workshop
Schedule
Videos
Monday, Apr. 14, 2025
9
–
9:15 a.m.
Coffee and Check-In
9:15
–
9:30 a.m.
Welcome Address
9:30
–
10:30 a.m.
Simulating Counterfactual Training
Roger Grosse (University of Toronto)
10:30
–
11 a.m.
Break
11 a.m.
–
12 p.m.
AI Safety Via Inference-Time Compute
Boaz Barak (Harvard University)
12
–
2 p.m.
Lunch (on your own)
2
–
3 p.m.
Controlling Untrusted AIs With Monitors
Ethan Perez (Anthropic)
3
–
3:30 p.m.
Break
3:30
–
4:30 p.m.
Game Theoretic Approaches to AI Safety
Georgios Piliouras (Google DeepMind + SUTD)
4:30
–
5:30 p.m.
Reception
Tuesday, Apr. 15, 2025
9:30
–
10 a.m.
Coffee and Check-In
10
–
11 a.m.
Full-Stack Alignment
Ryan Lowe (Meaning Alignment Institute)
11
–
11:30 a.m.
Break
11:30 a.m.
–
12:30 p.m.
Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?
Geoffrey Irving (UK AI Safety Institute)
12:30
–
2:30 p.m.
Lunch (on your own)
2:30
–
3:30 p.m.
Amortised Inference Meets Llms: Algorithms And Implications For Faithful Knowledge Extraction
Nikolay Malkin (University of Edinburgh)
3:30
–
4 p.m.
Break
4
–
5 p.m.
Superintelligent Agents Pose Catastrophic Risks — Can Scientist AI Offer a Safer Path? | Richard M. Karp Distinguished Lecture
Yoshua Bengio (IVADO - Mila - Université de Montréal)
5
–
6 p.m.
Panel Discussion
Yoshua Bengio (IVADO - Mila - Université de Montréal)
,
Dawn Song (UC Berkeley)
,
Roger Grosse
,
Geoffrey Irving
,
Siva Reddy (IVADO - Mila - McGill University)
Wednesday, Apr. 16, 2025
8:30
–
9 a.m.
Coffee and Check-In
9
–
10 a.m.
Robustness of jailbreaking across aligned LLMs, reasoning models and agents
Siva Reddy (IVADO - Mila - McGill University)
10
–
10:15 a.m.
Break
10:15
–
11:15 a.m.
Adversarial Robustness of LLMs' Safety Alignment
Gauthier Gidel (IVADO - Mila - Université de Montréal)
11:15
–
11:30 a.m.
Break
11:30 a.m.
–
12:30 p.m.
Antidistillation Sampling
Zico Kolter (Carnegie Mellon University)
12:30
–
2 p.m.
Lunch (on your own)
2
–
3 p.m.
Causal Representation Learning: A Natural Fit for Mechanistic Interpretability
Dhanya Sridhar (IVADO + Université de Montréal + Mila)
3
–
3:15 p.m.
Break
3:15
–
4:15 p.m.
Out Of Distribution, Out Of Control? Understanding Safety Challenges In AI
Aditi Raghunathan (Carnegie Mellon University)
Thursday, Apr. 17, 2025
9
–
9:30 a.m.
Coffee and Check-In
9:30
–
10:30 a.m.
LLM Negotiations And Social Dilemmas
Aaron Courville (IVADO + Université de Montréal + Mila)
10:30
–
11 a.m.
Break
11 a.m.
–
12 p.m.
Scalably Understanding AI With AI
Jacob Steinhardt (UC Berkeley)
12
–
1:45 p.m.
Lunch (on your own)
1:45
–
2:45 p.m.
Future Directions In AI Safety Research
Dawn Song (UC Berkeley)
2:45
–
3 p.m.
Break
3
–
4 p.m.
What Can Theory Of Cryptography Tell Us About AI Safety
Shafi Goldwasser (Simons Institute, UC Berkeley)
4
–
5 p.m.
Assessing The Risk Of Advanced Reinforcement Learning Agents Causing Human Extinction
Michael Cohen (UC Berkeley)
Friday, Apr. 18, 2025
8:30
–
9 a.m.
Coffee and Check-In
9
–
10 a.m.
Safeguarded AI Workflows
David Dalrymple (MIT)
10
–
10:15 a.m.
Break
10:15
–
11:15 a.m.
AI Safety: LLMs, Facts, Lies, And Agents In The Real World
Christopher Pal (IVADO + Polytechnique Montréal + Université de Montréal + Mila)
11:15
–
11:30 a.m.
Break
11:30 a.m.
–
12:30 p.m.
Measurements For Capabilities And Hazards
Dan Hendrycks (Center for AI Safety)
12:30
–
2 p.m.
Lunch (on your own)
2
–
3 p.m.
Theoretical And Empirical Aspects Of Singular Learning Theory For AI Alignment
Daniel Murfet (Timaeus)
3
–
3:30 p.m.
Break
3:30
–
4:30 p.m.
Probabilistic Safety Guarantees Using Model Internals
Jacob Hilton (Alignment Research Center)
4:30
–
4:45 p.m.
Closing Remarks
Share this page
Copy URL of this page
link to homepage
Close
Main navigation
Home
Programs & Events
Research Programs
Workshops & Symposia
Public Lectures
Research Pods
Internal Program Activities
Algorithms, Society, and the Law
People
Scientific Leadership
Staff
Current Long-Term Visitors
Research Fellows
Postdoctoral Researchers
Scientific Advisory Board
Governance Board
Industry Advisory Council
Affiliated Faculty
Science Communicators in Residence
Law and Society Fellows
Participate
Apply to Participate
Plan Your Visit
Location & Directions
Postdoctoral Research Fellowships
Law and Society Fellowships
Science Communicator in Residence Program
Circles
Breakthroughs Workshops and Goldwasser Exploratory Workshops
Support
Annual Fund
Funders
Industrial Partnerships
News & Videos
News
Videos
About
Utility navigation
Calendar
Contact
Login
MAKE A GIFT
link to homepage
Close
Search