
Abstract
Transformer-based language models have undoubtedly become the dominant and favorite architecture for language generation of our time. However, although they provide impressive text quality, they tend to be hard to control. In the domain of image synthesis, on the other hand, Denoising Diffusion Models (DDM) are the dominant approach, shining with unprecedented quality and control. The application of DDMs to discrete domains like language remains a challenging open problem. This talk addresses this challenge head-on. First, we introduce Latent Diffusion for Language Generation that enables DDMs for text generation in the latent space of text auto-encoders, enabling the generation of fluent text through latent diffusion. Further, we utilize diffusion models to generate semantic proposals that guide autoregressive text decoders. The latter approach combines the fluency of autoregression with the plug-and-play control of diffusion. Through these works, we demonstrate how diffusion models can be adapted to language, opening new avenues for flexible and controllable language generation systems.