Abstract

Empirical risk minimization (ERM) is the dominant paradigm in statistical learning. Optimizing the empirical risk of neural networks is a highly non-convex optimization problem but, despite this, it is routinely solver to optimality or near optimality using first order methods such has stochastic gradient descent. It has recently been argued that overparametrization plays a key role in explaining this puzzle: overparametrized models are simple to optimize. I will present examples of this phenomenon in some simple models and discuss lessons that can be learnt from a broader perspective.

Video Recording