About

As the landscape of artificial intelligence evolves, ensuring the safety and alignment of superintelligent language models (LLMs) is paramount. This workshop will delve into the theoretical foundations of LLM safety. This could include topics like the Bayesian view of LLM safety versus the RL view of safety and other theories. 

The flavor of this workshop is futuristic, focusing on how to ensure a superintelligent LLM/AI remains safe and aligned with humans.  This workshop is a joint effort of the Simons Institute and IVADO.

Key Topics:

  • Bayesian Approaches to LLM Safety
  • Reinforcement Learning Perspectives on Safety
  • Theoretical Frameworks for Ensuring AI Alignment
  • Case Studies and Practical Implications
  • Future Directions in LLM Safety Research
Chairs/Organizers