First-Order Stochastic Optimization

Workshop

Foundations of Data Science Boot Camp

Speaker(s)

Rachel Ward, University of Texas at Austin

Location

Calvin Lab Auditorium

Date

Tuesday, Aug. 28, 2018

Time

11 a.m. – 12 p.m. PT

Abstract

Stochastic Gradient Descent (SGD) is the basic first-order stochastic optimization algorithm behind powerful deep learning architectures that are becoming increasingly omnipresent in society. In this lecture, we motivate the use of stochastic first-order methods and recall some convergence results for SGD. We then discuss the notion of importance sampling for SGD and how it can improve the convergence rate. Finally, we discuss methods for making SGD more "robust" to hyper-parameters of the algorithm, such as the step size, using "on the fly" adaptive step size methods such as AdaGrad, and present some theoretical results.

First-Order Stochastic Optimization

Abstract

Video Recording