Abstract

We will describe recent results in applying second-order methods for training Deep Neural Networks. We will first introduce a new systematic approach for model compression using second-order information, resulting in unprecedented small models for a range of challenging tasks in image classification, object detection, and natural language processing, exceeding *all* industry level results including expensive Auto-ML based methods, which are searched at a massive scale.

Second, we will address a common misconception that computing second-order information is slow by presenting a new scalable framework for computing Hessian information. We will show that the Hessian information could be used during training with little overhead, resulting in a  speed-up of 3.58x in total training time as compared to state-of-the-art first order based methods for ResNet18 training on ImageNet.

Finally, I will discuss some future directions involving stochastic second-order methods for accelerating the training of neural networks and how the loss landscape curvature could be used as a reward function for searching new architectures.