Results 11 - 20 of 23832
This workshop will address a central challenge in AI policy: designing regulatory interventions that are both evidence-based and adaptive. We will focus on two guiding questions: (1) how to strengthen knowledge flows from the research community to...
When multiple agents -- potentially consisting of a mixture of AI and human participants -- collaborate on a task, they often bring complementary abilities and information sources. This leads to a number of computational problems related to the optimal way to synthesize decisions, and for determining when and how one member of the group should delegate to others. We'll consider a set of related problems of this form, focusing on choices related to hand-off, delegation, and the role of distinct information sources. The talk is based on joint work with Solon Barocas, Kate Donahue, Nikhil Garg, Sophie Greenwood, Hoda Heidari, Karen Levy, Gali Noti, Sigal Oren, and Kenny Peng.
Synthetic data such as AI-generated texts and images are increasingly common on the Internet. Recent studies show that if models are iteratively retrained on Internet data mixed with these synthetic data, their performance may deteriorate over time —- a concerning phenomenon often referred to as model collapse.
However, in practice, human engagement with Internet content (such as upvotes, likes, and follow-up interactions) can signal the quality of synthetic data. Can we save generative models from collapse by iteratively retraining them only on such “human-verified” synthetic data? Toward a principled understanding, we investigate this question in the fundamental linear regression setting, showing that retraining on verified synthetic data can avoid model collapse and, in fact, may even yield performance improvements. Our experiments across linear regression, Variational Autoencoders (VAEs) trained on MNIST, and fine-tuning SmolLM2-135M on the XSUM task confirm these theoretical insights.
Collaborative learning techniques can help train machine learning models that are superior to models trained on a single entity’s data. However, in many cases, potential participants in such collaborative schemes are competitors on a downstream task, such as different LLM providers. This can incentivize dishonest updates that damage other participants’ models and undermine the benefits of collaboration. In this talk, I will present a payment-based peer prediction mechanism that incentivizes participants to honestly report updates.
Statistical evaluation aims to estimate the generalization performance of a model using held-out i.i.d. test data sampled from the ground-truth distribution. In supervised learning settings such as classification, performance metrics such as error rate are well-defined, and test error reliably approximates population error given sufficiently large datasets. In contrast, evaluation is more challenging for generative models due to their open-ended nature: it is unclear which metrics are appropriate and whether such metrics can be reliably evaluated from finite samples. In this work, we introduce a theoretical framework for evaluating generative models and establish evaluability results for commonly used metrics.