Results 1 - 10 of 23826
Standard paradigms for LLM alignment and evaluation, such as RLHF for alignment and Elo-style rankings for leaderboard evaluation, implicitly assume a single underlying user utility model, an assumption that breaks down under heterogeneous human preferences. As a result, these methods can systematically fail to optimize even for average user satisfaction. In this talk, I will take a social choice perspective to revisit alignment and evaluation under heterogeneous preferences.
I will first discuss pluralistic alignment and introduce the distortion of AI alignment, a framework that captures the worst-case ratio between the optimal achievable average utility and the average utility of the learned policy. This framework characterizes information-theoretic limits of learning from pairwise comparisons, and draws sharp distinctions between alignment methods, showing that Nash Learning from Human Feedback is provably optimal, whereas standard approaches like RLHF and DPO can suffer high or even unbounded distortion. I will then turn to leaderboard-based evaluation and discuss the construction of pluralistic leaderboards, which aim to produce a single global ranking while ensuring fair and stable representation of diverse user populations.
Aligning AI systems with human values remains a fundamental challenge, but does our inability to create perfectly aligned models preclude obtaining the benefits of alignment? I will present a strategic setting where a human user interacts with multiple differently misaligned AI agents, none of which are individually well-aligned. Nonetheless, when the user's utility lies approximately within the convex hull of the agents' utilities, a condition that becomes easier to satisfy as model diversity increases, strategic competition can yield outcomes comparable to interacting with a perfectly aligned model. I will then move to a setting with multiple heterogeneous users and discuss how the role of model personalization affects emergent pluralistic alignment.
Collaboration is crucial for reaching collective goals. However, its effectiveness is often undermined by the strategic behavior of individual agents---a fact that is captured by a high Price of Stability (PoS) in recent literature. Implicit in the traditional PoS analysis is the assumption that agents have full knowledge of how their tasks relate to one another. We offer a new perspective on bringing about efficient collaboration among strategic agents using information design. Inspired by the growing importance of collaboration in machine learning (such as platforms for collaborative federated learning and data cooperatives), we propose a framework where the platform has more information about how the agents' tasks relate to each other. We characterize how and to what degree such platforms can leverage this information advantage to steer strategic agents toward efficient collaboration.
Concretely, we consider collaboration networks where each node is a task type held by one agent, and each task benefits from contributions made in their inclusive neighborhood of tasks. This network structure is known to the agents and the platform, but only the platform knows each agent's real location. We design two families of persuasive signaling schemes that the platform can use to ensure a small total workload when agents follow the signal. The first family aims to achieve the minmax optimal approximation ratio compared to the optimal collaboration. The second family ensures per-instance strict improvement compared to full information disclosure.
When multiple agents -- potentially consisting of a mixture of AI and human participants -- collaborate on a task, they often bring complementary abilities and information sources. This leads to a number of computational problems related to the optimal way to synthesize decisions, and for determining when and how one member of the group should delegate to others. We'll consider a set of related problems of this form, focusing on choices related to hand-off, delegation, and the role of distinct information sources. The talk is based on joint work with Solon Barocas, Kate Donahue, Nikhil Garg, Sophie Greenwood, Hoda Heidari, Karen Levy, Gali Noti, Sigal Oren, and Kenny Peng.
Synthetic data such as AI-generated texts and images are increasingly common on the Internet. Recent studies show that if models are iteratively retrained on Internet data mixed with these synthetic data, their performance may deteriorate over time —- a concerning phenomenon often referred to as model collapse.
However, in practice, human engagement with Internet content (such as upvotes, likes, and follow-up interactions) can signal the quality of synthetic data. Can we save generative models from collapse by iteratively retraining them only on such “human-verified” synthetic data? Toward a principled understanding, we investigate this question in the fundamental linear regression setting, showing that retraining on verified synthetic data can avoid model collapse and, in fact, may even yield performance improvements. Our experiments across linear regression, Variational Autoencoders (VAEs) trained on MNIST, and fine-tuning SmolLM2-135M on the XSUM task confirm these theoretical insights.
Collaborative learning techniques can help train machine learning models that are superior to models trained on a single entity’s data. However, in many cases, potential participants in such collaborative schemes are competitors on a downstream task, such as different LLM providers. This can incentivize dishonest updates that damage other participants’ models and undermine the benefits of collaboration. In this talk, I will present a payment-based peer prediction mechanism that incentivizes participants to honestly report updates.
Statistical evaluation aims to estimate the generalization performance of a model using held-out i.i.d. test data sampled from the ground-truth distribution. In supervised learning settings such as classification, performance metrics such as error rate are well-defined, and test error reliably approximates population error given sufficiently large datasets. In contrast, evaluation is more challenging for generative models due to their open-ended nature: it is unclear which metrics are appropriate and whether such metrics can be reliably evaluated from finite samples. In this work, we introduce a theoretical framework for evaluating generative models and establish evaluability results for commonly used metrics.