What (Doesn't) Make Learning Algorithms Generalize?

Abstract

In Machine Learning, interpolation is a common technique used to achieve low training error by fitting the model exactly to the training data. This approach results in an overabundance of parameters, making it a challenge to understand the generalization ability of the model.

Despite these difficulties, deep networks have been successful using interpolation. To unravel the mystery of their generalization, various principles have been proposed, including stability, regularization, as well as refined information-theoretic generalization bounds.

In this talk, we will consider Stochastic Convex Optimization as a case study and address the question, "Why does Stochastic Gradient Descent (SGD) generalize?" We will investigate the role of stability and regularization in understanding the generalization of algorithms in this, allegedly, simplistic case. Our findings reveal that, even in this simple setup where learnability can be demonstrated, the situation is complex and SGD and other learning algorithms do not align with traditional principles.

What (Doesn't) Make Learning Algorithms Generalize?

Abstract

Video Recording