Testing Assumptions of Learning Algorithms

Abstract

Our society increasingly relies on algorithms and data analysis to make critical decisions. Yet, almost all work in the theory of supervised learning has long relied on the following two assumptions:
1. Distributional assumptions: data satisfies conditions such as Gaussianity or uniformity.
2. No distribution shift: data distribution does not change between training and deployment. While natural and often correct, these assumptions oftentimes do not hold. Nevertheless, these assumptions are routinely made for giving theoretical guarantees for supervised learning algorithms.

Such guarantees can become null and void, should one of these assumptions not hold.

Overall, if critical decisions rely on theoretical reliability guarantees, incorrect assumptions can result in catastrophic failure. This bootcamp talk discusses how property testing can be used to mitigate this dependence. We introduce and develop testers which can alert a user if some assumptions are not satisfied. Leveraging insights from the area of property testing, we discuss how to construct such testers for a number of well studied function classes, addressing distributional assumptions and distribution shift.