Splitting Gradient Descent for Incremental Learning of Neural Architectures

Abstract

Efficient and automatic optimization of neural network structures is a key challenge in modern deep learning. Compared with parameter optimization which has been well addressed by (stochastic) gradient descent, optimizing model structures involves significantly more challenging combinatorial optimization with large search spaces and expensive evaluation functions. Although there have been rapid progresses recently, designing the best architectures still requires a lot of expert knowledge or expensive black-box optimization approaches (including reinforcement learning).

This work extends the power of gradient descent to the domain of model structure optimization. In particular, we consider the problem of progressively growing a neural network by “splitting” existing neurons into several “off-springs”, and develop a simple and practical approach for deciding the best subset of neurons to split and how to split them, adaptively based on the existing structure. Our approach is derived based on viewing structure optimization as a functional optimization in a space of distributions, such that our splitting strategy is a second order functional steepest descent for escaping saddle points in an infinite-Wasserstein metric space, while the standard parametric gradient descent is the first order functional steepest descent. Our method provides a very fast approach for finding efficient and compact neural architectures, especially for resource-constrained settings when smaller models are preferred for inference time and energy constraints.

Splitting Gradient Descent for Incremental Learning of Neural Architectures

Abstract

Video Recording