Abstract
Large-scale vision benchmarks have driven---and often even defined---progress in machine learning. However, these benchmarks are merely proxies for the real-world tasks we actually care about. How well do our benchmarks capture such tasks?
In this talk, I will briefly survey examples of misalignment between the popular ImageNet benchmark and the real-world use case that motivates it. I will then discuss how tools such as statistical modeling can aid our efforts to systematically diagnose (and mitigate) this kind of misalignments.
Based on joint works with Logan Engstrom, Andrew Ilyas, Shibani Santurkar, Jacob Steinhardt, Dimitris Tsipras and Kai Xiao.