Continuous Time Stochastic Gradient Descent and Flat Minimum Selection

Workshop

Dynamics and Discretization: PDEs, Sampling, and Optimization

Speaker(s)

Stephan Wojtowytsch (Texas A&M)

Location

Calvin Lab auditorium and Zoom

Date

Thursday, Oct. 28, 2021

Time

11:30 a.m. – 12 p.m. PT

Abstract

The set of minimizers in optimization problems in deep learning is typically either large and high-dimensional or empty. Which minimizer is chosen for practical purposes depends on the choice of optimization algorithm, the initial condition, and hyperparameters of the algorithm. In a continuous time model for stochastic gradient descent, we can analyze the invariant distribution of the algorithm and show that it finds minimizers where the loss landscape is 'flat' in a precise sense. The notion of flatness depends crucially on how the noise intensity at a point scales with the value of the objective function. Under stronger technical conditions, we prove exponential convergence to the invariant distribution.

Attachment

Slides

Continuous Time Stochastic Gradient Descent and Flat Minimum Selection

Abstract

Attachment

Video Recording