Abstract

A dominant learning strategy is to train a neural network by minimizing it’s loss using gradient descent, or perhaps stochastic gradient descent.  In this talk we ask what learning problems are potentially learnable using such an approach?  Can such “differentiable learning” go beyond what is learnable using kernel methods, including the Tangent Kernel approximation?  And how is this related to learning using Statistical Queries?  Does this require stochasticity, and training using Stochastic Gradient Descent (SGD) rather than batch Gradient Descent, and is SGD more powerful than Gradient Descent?

Joint work with Emmanuel Abbe, Pritish Kamath, Eran Malach and Colin Sandon

Video Recording