Abstract

We study the question of what functions can be learned by neural network training algorithms, and prove surprisingly general lower bounds in the realizable case, where the function to be learned can be represented exactly as a neural network. Previous lower bounds apply only for special activation functions or unnatural input distributions, or for highly restricted classes of algorithms. We prove lower bounds for learning a family of functions realizable as small neural networks with a single hidden layer, with any activation function from a broad family (including sigmoid and ReLU) over any ``nice'' input distribution (e.g., Gaussian or uniform). Our results apply to any statistical query algorithm, including all known neural network training algorithms, regardless of the model architecture, loss function, or optimization routine.
 
Joint work with Le Song, Santosh Vempala, and Bo Xie

Video Recording