Results 11 - 20 of 23765
This workshop will consider developing and facilitating collaborative learning systems to reflect desired social and economic principles. Analyzing and addressing such concerns is paramount for the ongoing success of collaborative learning and for the...
Attributing LLM outputs to the training examples that causally influence their behavior can give us visibility into LLMs’ opaque reasoning and help us understand subtle persona changes. Unfortunately, finding training data attribution algorithms which are both accurate and scalable has remained an elusive goal. I argue for separately studying an Estimation Problem (accurately estimating the causal effect of a training example) and a Retrieval Problem (efficiently finding the highest-scoring training examples). I then present a generic retrieval method for influential sequences which can be paired with a wide range of influence estimators (including EKFAC) and for which one can obtain high confidence about recall. I discuss how causal training data attribution can be used as a tool to assure LLM alignment.
Empirical conclusions depend not only on data but on analytic decisions made throughout the research process. Many-analyst studies have quantified this: independent teams testing the same hypothesis on the same dataset often reach conflicting conclusions. But such studies require costly coordination and are rarely conducted. We show that fully autonomous AI analysts built on large language models (LLMs) can cheaply and at scale replicate this structured analytic diversity. In our framework, each AI analyst executes a complete analysis pipeline on a fixed dataset and hypothesis, while a separate AI auditor screens runs for methodological validity. Across three datasets, AI-generated analyses exhibit substantial dispersion in effect sizes, p-values, and conclusions, driven by systematic differences in preprocessing, model specification, and inference across LLMs and personas. Critically, outcomes are steerable: changing the analyst persona or model shifts the distribution of results even among valid analyses.
These findings highlight a central challenge for AI-automated empirical science: when defensible analyses are cheap, evidence becomes abundant and vulnerable to selective reporting. But the same capability suggests a solution: treating results as distributions makes analytic uncertainty visible, and deploying AI analysts on a fixed specification can reveal disagreement from underspecified choices. We therefore argue for new transparency norms: multiverse-style reporting and prompt disclosure, alongside code and data.
Joint work with Martin Bertran and Riccardo Fogliato
When designing compound AI systems, a common approach is to query multiple copies of the same model and aggregate the responses to produce a synthesized output. Given the homogeneity of these models, this raises the question of whether aggregation unlocks access to a greater set of outputs than querying a single model. In this talk, we investigate the power and limitations of aggregation within a stylized principal-agent framework. This framework models how the system designer can partially steer each agent's output through its reward function specification, but still faces limitations due to prompt engineering ability and model capabilities. Our analysis uncovers three natural mechanisms -- feasibility expansion, support expansion, and binding set contraction -- through which aggregation expands the set of outputs that are elicitable by the system designer. We prove that any aggregation operation must implement one of these mechanisms in order to be elicitability-expanding, and that strengthened versions of these mechanisms provide necessary and sufficient conditions that fully characterize elicitability-expansion. Finally, we provide an empirical illustration of our findings for LLMs deployed in a toy reference-generation task. Altogether, our results take a step towards characterizing when compound AI systems can overcome limitations in model capabilities and in prompt engineering.
Based on joint work with Nivasini Ananthakrishnan.