Results 2321 - 2330 of 23900
Recent advances in large language models (LLMs) have achieved remarkable results across a wide range of natural language processing (NLP) applications. As LLMs grow increasingly capable, the need to control their generation outcome becomes more pressing,...
In this talk, I address two pressing shortcomings of large language models (LLMs):
In the first part, I argue that it is useful to distinguish between common knowledge—widely known information—and tail knowledge, which is highly specific and typically looked up rather than remembered. I introduce a simple pre-training strategy that separates tail from common knowledge and encourages the model to refrain from memorizing tail knowledge. Instead, the model learns to retrieve such information from an external database during inference, which is constructed during pre-training.
In the second part of the talk, I turn to the issue of controllability. I demonstrate that LLMs can be controlled effectively when guided by latent diffusion models. I explain the inner workings of diffusion models and how they can be adapted to generate latent states for autoregressive decoders, enabling more precise and reliable control over LLM outputs.
The advent of large language models (LLMs) such as ChatGPT changed forever the public perception of artificial intelligence. Our panel of experts will discuss why LLMs proved to be so surprising, even to researchers in the field, and why the explosion of increasingly powerful models has inevitably led to whispers about artificial general intelligence (AGI). We'll examine whether LLMs are sufficient to get us to AGI and, if not, what the missing ingredients might be.
Click here to visit event webpage.
_______________________
Theoretically Speaking is a lecture series highlighting exciting advances in theoretical computer science for a broad general audience. Events are free and open to the public, with first-come, first-served seating. No special background is assumed. Registration is required. This lecture will be viewable afterward on this page and on our YouTube channel, following captioning.
Light refreshments will be provided before the talk, starting at 4:30 p.m.
The Simons Institute regularly captures photos and video of activity around the Institute for use in publications and promotional materials.
If you require special accommodation, please contact our access coordinator at simonsevents@berkeley.edu with as much advance notice as possible.
Large Reasoning Models like DeepSeek-R1 mark a fundamental shift in how LLMs approach complex problems. Instead of directly producing an answer for a given input, DeepSeek-R1 creates detailed multi-step reasoning chains, seemingly “thinking” about a problem before providing an answer. This reasoning process is publicly available to the user, creating endless opportunities for studying the reasoning behaviour of the model and opening up the field of Thoughtology. Starting from a taxonomy of DeepSeek-R1’s basic building blocks of reasoning, our analyses on DeepSeek-R1 investigate the impact and controllability of thought length, management of long or confusing contexts, cultural and safety concerns, and the status of DeepSeek-R1 vis-à-vis cognitive phenomena, such as human-like language processing and world modelling. Our findings paint a nuanced picture. Notably, we show DeepSeek-R1 has a ‘sweet spot’ of reasoning, where extra inference time can impair model performance. Furthermore, we find a tendency for DeepSeek-R1 to persistently ruminate on previously explored problem formulations, obstructing further exploration. I will also present, VinePPO, an effective RL algorithm to improve reasoning abilities.