Murat A. Erdogdu (University of Toronto)
In this talk, we discuss the convergence properties of the online SGD when (i) the noise variance is infinity or (ii) objective function is non-convex. The first part of the talk will focus on the convergence of SGD under heavy-tailed gradient noise with a potentially infinite variance, for a class of strongly convex objectives. In this setting, we show that SGD can converge to the global optimum for a wide range of step-size choices, without necessitating any modification neither to the loss function or to the algorithm itself. The second part of the talk will focus on SGD with constant step-size in the non-convex regime. We will establish the convergence of SGD when the objective function has quadratic tails (allowing for non-convexity), and characterize its bias, i.e., the distance between the SGD iterates and the critical points of the objective under various local regularity conditions. We will discuss the limit distribution of Polyak-Ruppert averaging in both regimes, and establish (generalized) central limit theorems.