
Abstract
In this talk, I address two pressing shortcomings of large language models (LLMs):
In the first part, I argue that it is useful to distinguish between common knowledge—widely known information—and tail knowledge, which is highly specific and typically looked up rather than remembered. I introduce a simple pre-training strategy that separates tail from common knowledge and encourages the model to refrain from memorizing tail knowledge. Instead, the model learns to retrieve such information from an external database during inference, which is constructed during pre-training.
In the second part of the talk, I turn to the issue of controllability. I demonstrate that LLMs can be controlled effectively when guided by latent diffusion models. I explain the inner workings of diffusion models and how they can be adapted to generate latent states for autoregressive decoders, enabling more precise and reliable control over LLM outputs.