Image
We explain how hallucinations in language models are rise in the first place. Prior work has shown how to modify models to reduce hallucinations. We also argue that the reason hallucinations persist is that benchmarks inadvertently reward guessing when unsure, and we discuss incentive compatible ways to modify the benchmarks.
Joint work with Santosh Vempala, Ofir Nachum and Edwin Zhang.