Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

Workshop

Deep Learning Theory Symposium

Speaker(s)

Rapha,

ël Berthier

Location

Calvin Lab Auditorium

Date

Monday, Dec. 6, 2021

Time

1 – 1:15 p.m. PT

Abstract

Most theoretical guarantees for stochastic gradient descent (SGD) assume that the iterates are averaged, that the stepsizes are decreasing, and/or that the objective is regularized. However, practice shows that these tricks are less necessary than theoretically expected. I will present an analysis of SGD that uses none of these tricks: we analyze the behavior of the last iterate of fixed step-size, non-regularized SGD. Our results apply for kernel regression, i.e., infinite-dimensional linear regression. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the observation of its value at randomly sampled points.

Non-Parametric Convergence Rates for Plain Vanilla Stochastic Gradient Descent

Abstract

Video Recording