Results 2251 - 2260 of 23900

Challenges in State-of-the-Art Bit-Precise Reasoning

Panel Discussion

Panel Discussion.

Moderated by Siva Reddy.

Superintelligent Agents Pose Catastrophic Risks — Can Scientist AI Offer a Safer Path? | Richard M. Karp Distinguished Lecture

The leading AI companies are increasingly focused on building generalist AI agents — systems that can autonomously plan, act, and pursue goals across almost all tasks that humans can perform. Despite how useful these systems might be, unchecked AI agency poses significant risks to public safety and security, ranging from misuse by malicious actors to a potentially irreversible loss of human control. In this talk, Yoshua Bengio will discuss how these risks arise from current AI training methods.

Indeed, various scenarios and experiments have demonstrated the possibility of AI agents engaging in deception or pursuing goals that were not specified by human operators and that conflict with human interests, such as self-preservation. Following the precautionary principle, Bengio and his colleagues see a strong need for safer, yet still useful, alternatives to the current agency-driven trajectory. Accordingly, they propose as a core building block for further advances the development of a non-agentic AI system that is trustworthy and safe by design, which they call Scientist AI. This system is designed to explain the world from observations, as opposed to taking actions in it to imitate or please humans. It comprises a world model that generates theories to explain data and a question-answering inference machine. Both components operate with an explicit notion of uncertainty to mitigate the risks of overconfident predictions.

In light of these considerations, a Scientist AI could be used to assist human researchers in accelerating scientific progress, including in AI safety. In particular, this system could be employed as a guardrail against AI agents that might be created despite the risks involved. Ultimately, focusing on non-agentic AI may enable the benefits of AI innovation while avoiding the risks associated with the current trajectory. Bengio and his colleagues hope these arguments will motivate researchers, developers, and policymakers to favor this safer path.

Yoshua Bengio is a full professor in the Department of Computer Science and Operations Research at Université de Montréal, as well as the founder and scientific director of Mila and the scientific director of IVADO. He also holds a Canada CIFAR AI chair. Considered one of the world’s leaders in artificial intelligence and deep learning, he is the recipient of the 2018 A.M. Turing Award, considered the “Nobel Prize of computing.”

He is a fellow of both the U.K.’s Royal Society and the Royal Society of Canada, an officer of the Order of Canada, a knight of the Legion of Honor of France, and a member of the U.N.’s Scientific Advisory Board for Independent Advice on Breakthroughs in Science and Technology.

This talk will be followed by a panel discussion from 5 – 6 p.m.

The Richard M. Karp Distinguished Lectures were created in Fall 2019 to celebrate the role of Simons Institute Founding Director Dick Karp in establishing the field of theoretical computer science, formulating its central problems, and contributing stunning results in the areas of computational complexity and algorithms. Formerly known as the Simons Institute Open Lectures, the series features visionary leaders in the field of theoretical computer science and is geared toward a broad scientific audience.

Light refreshments will be available at 3:30 p.m., prior to the start of the lecture.

The lecture recording URL will be emailed to registered participants. This URL can be used for immediate access to the livestream and recorded lecture. Lecture recordings will be publicly available on SimonsTV about 12 to 15 days following each presentation unless otherwise noted.

The Simons Institute regularly captures photos and video of activity around the Institute for use in publications and promotional materials.

If you require special accommodation, please contact our access coordinator at simonsevents@berkeley.edu with as much advance notice as possible.

Amortised Inference Meets Llms: Algorithms And Implications For Faithful Knowledge Extraction

Many problems in large language models -- constrained generation, reasoning and planning, information extraction, alignment with human feedback -- can be understood as Bayesian inference tasks involving intractable posterior sampling under an (autoregressive or diffusion) LLM prior. I survey various such problems and the methods from the probabilistic inference literature that can be used to solve them, including Monte Carlo methods, amortised variational inference using deep RL, and hybrid techniques, and their benefits for learning generalisable yet uncertainty-aware reasoners and planners. Conversely, extracting structured knowledge -- such as relational or causal information -- from pretrained language models presents challenges due to the inherent limitations of prompting methods, which lack guarantees of logical or distributional consistency and calibrated uncertainty measures. It is hypothesised that the same amortised inference techniques can enable faithful extraction of structured knowledge from LLMs, by constructing a symbolic structure that is consistent with a fixed language model's predictions. We conclude by discussing the implications of this direction of research for the development of aligned and probabilistically-guaranteed-safe AI systems.

Can We Get Asymptotic Safety Guarantees Based On Scalable Oversight?

Scalable oversight attempts to align AI systems to human values by training AI models based on human feedback and using AI assistance to strengthen that human feedback signal. This talk will cover:

1. Recent theoretical work applying tools from computational complexity, multi-agent training dynamics, and learning theory to design improved scalable oversight methods which achieve theoretical guarantees given simplified assumptions about human feedback.
2. Prospects for extending such methods to weaker (and thus more realistic) assumptions about human feedback, and stronger requirements on solutions.
3. Prospects for integrating these developments into practical ML training.

For (1), we have a new "prover-predictor game" variant of debate which (in a theoretical setting with sufficiently strong assumptions) avoids the "obfuscated arguments" problem discovered during human participant scalable oversight experiments in 2020. Previous versions of debate either assumed infinitely powerful agents or required computational complexity proportional to the length of a human-checkable argument. The new method allows ML systems to spend time related to the length of an ML-checkable argument, which can be much shorter if superhuman heuristics are involved.

For (2), the talk will lay out some sources of optimism in the hopes of encouraging more work in this area. There are concrete theoretical limitations in the current methods which may be addressable using tools from theory. It is not clear that this work will succeed, but it is importantly orthogonal to much of the safety research occurring at AI labs today, and I believe there are strong prospects for bringing new ideas from other areas of theoretical computer science which have not yet been applied to AI safety.

For (3), the new method has the structure of a zero-sum, adversarial team game, and both theoretical and practical evidence shows that such games admit practical, convergent training methods. Importantly, while the asymptotic guarantees provided by this type of theory are weaker than full verification, they may also be more likely to translate into practice.

Full-Stack Alignment

Abstract not available.

Game Theoretic Approaches to AI Safety

Abstract not available.

Controlling Untrusted AIs With Monitors

Abstract not available.

AI Safety Via Inference-Time Compute

Abstract not available.