Natasha 2: Faster Non-convex Optimization Than SGD

Workshop

Fast Iterative Methods in Optimization

Speaker(s)

Zeyuan Allen-Zhu, Microsoft Research

Location

Date

Friday, Oct. 6, 2017

Time

11:30 a.m. – 12:15 p.m. PT

Abstract

We design a stochastic algorithm to train any smooth neural network to eps-approximate local minima, using O(e^{-3.25}) backpropagations. The best result was essentially O(e^{-4}) by SGD.

More broadly, it finds eps-approximate local minima of any smooth nonconvex function in rate O(e^{-3.25}), with only oracle access to stochastic gradients and Hessian-vector products.

Natasha 2: Faster Non-convex Optimization Than SGD

Abstract

Video Recording