Results 581 - 590 of 23765
I will discuss issues around human values embedded in both widely-used clinical equations and recent large language models being used in clinical care. I will discuss how the behavior of models, both old and new, can be steered including the potential for substantial changes to diagnostic and management reasoning.
There is a long-running debate over using standardized test scores in college and graduate admissions, with some arguing that test scores are an important signal of academic qualification and others arguing that they they are a biased measure of ability. Here we revisit this issue by analyzing a novel dataset of more than 16,000 applications over roughly a decade to a large public policy master’s program in the United States. Consistent with past work, we find that GRE scores substantially improve predictions of first-year grades, relative to predictions based on GPA alone. However, when these predictions are used to inform admissions decisions, we find that test scores only modestly improve the expected academic quality of admitted students. The gap between using and not using test scores is further reduced when we augment the baseline GPA-based predictions with more fine-grained information available in student transcripts and other application materials. To explain these results, we observe that even with improved predictions, the downstream admissions decisions are often the same; and where there are differences, it is often from selecting between similarly qualified applicants. Our results highlight the importance of distinguishing between predictions and decisions when assessing the marginal value of test scores in admissions.
As AI models become increasingly powerful, it is an attractive proposition to use them in important decision making pipelines, in collaboration with human decision makers. But how should a human being and a machine learning model collaborate to reach decisions that are better than either of them could achieve on their own? If the human and the AI model were perfect Bayesians, operating in a setting with a commonly known and correctly specified prior, Aumann’s classical agreement theorem would give us one answer: they could engage in conversation about the task at hand, and their conversation would be guaranteed to converge to (accuracy-improving) agreement. This classical result however would require making many implausible assumptions, both about the knowledge and computational power of both parties. We show how to recover similar (and more general) results using only computationally and statistically tractable assumptions, which substantially relax full Bayesian rationality. In the second part of the talk, we go on to consider a more difficult problem: that the AI model might be acting at least in part to advance the interests of its designer, rather than the interests of its user, which might be in tension. We show how market competition between different AI providers can mitigate this problem assuming only a mild “market alignment” assumption — that the user’s utility function lies in the convex hull of the AI providers utility functions — even when no single provider is well aligned. In particular, we show that in all Nash equilibria of the AI providers under this market alignment condition, the user is able to advance her own goals as well as she could have in collaboration with a perfectly aligned AI model.
This talk describes the results of three papers, which are joint works with Natalie Collina, Ira Globus-Harris, Surbhi Goel, Varun Gupta, Emily Ryu, and Mirah Shi:
Tractable Agreement Protocols: https://arxiv.org/abs/2411.19791 (STOC 2025)
Collaborative Prediction: Tractable Information Aggregation via
Agreement: https://arxiv.org/abs/2504.06075 (SODA 2026)
Emergent Alignment from Competition: https://arxiv.org/abs/2509.15090
Abstract not available