Results 2311 - 2320 of 23900
The quality of generative models depends on the quality of the data on which they are trained. Access to high-quality data is scarce and expensive, while noisy samples are generally more accessible. State-of-the-art generative models are often trained on curated datasets that emerge from highly filtered data pools from the Web and other sources. In this talk, we will show that there is immense value in the lower quality data that are often discarded. We will present an algorithmic framework to train generative models using a combination of a small set of expensive, high-quality samples and a large set of cheap, noisy points. Our framework is instantiated for diffusion generative models, specifically through our Ambient Diffusion method. We will show how Ambient Diffusion enables training on noisy images and that it achieves state-of-the-art performance in de novo protein design. Time permitting, we will also present preliminary extensions to autoregressive language modeling and discuss broader implications for memorization, dataset design, and model performance.
The progress in techniques to evaluate LLMs has regrettably fallen behind the progress in LLM development, making it challenging to quantify progress. My research calls for rethinking the fundamental principles underlying the evaluation of Transformer-based language models. I will discuss some work on applying language models to real tasks, as well as the selection of test data for efficient and robust evaluation. I will present challenges that arise when we do not know what the ground truth might be. Finally, I will discuss some ideas on evaluating Transformer language models without involving language.
ost-training is essential for enhancing large language model (LLMs) capabilities and aligning them to human preferences. One of the most
widely used post-training techniques is reinforcement learning from human feedback (RLHF). In this talk, I will first discuss the challenges of applying RL to LLM training. Next, I will introduce RL algorithms that tackle these challenges by utilizing key properties of the underlying problem. Additionally, I will present an approach that
simplifies the RL policy optimization process for LLMs to relative reward regression.
Activation functions play a pivotal role in deep neural networks, enabling them to tackle complex tasks like image recognition. However, activation functions also introduce significant challenges for deep learning theory, network dynamics analysis, and properties such as interpretability and privacy. In this talk, we revisit the necessity of activation functions, especially in cases where high-order interactions among the input elements are used, such as in the attention mechanism. Specifically, we highlight how high-order interactions are sufficient for retaining the necessary expressivity. Yet, the question remains: Is this expressivity alone sufficient for effective learning? We highlight networks that achieve strong performance both in demanding static tasks, such as ImageNet recognition, and sequence-to-sequence tasks, such as arithmetic tasks and language modeling.
In this talk, we study the problem of predicting (and optimizing) the counterfactual behavior of large-scale ML models. We start by focusing on “data counterfactuals,” where the goal is to estimate the effect of modifying a training dataset on the resulting machine learning outputs (and conversely, to design datasets that induce specific desired behavior). We introduce a method that almost perfectly estimates such counterfactuals, unlocking some new possibilities in the design and evaluation of ML models, including state-of-the-art data attribution, selection, and poisoning.