Abstract

Modern deep neural networks are seemingly complex systems whose design and performance are at present not well-understood. One first step towards demystifying their complexity is to identify limits in the system variables that are relevant to practice, and to study them theoretically if possible. Overparameterization in width is one such limit, with practice suggesting that wider models often do better than their smaller-sized counterparts. I review our past work that studies the limit of infinitely wide networks. This includes connecting the prior over functions in deep neural networks with Gaussian processes for fully-connected and convolutional architectures; we further consider Bayesian inference in such networks and comparisons to gradient-descent trained finite-width networks. I’ll also discuss the result of gradient dynamics in such infinitely wide networks and their equivalence to models that are linear in their parameters. Time permitting, I’ll briefly mention work in progress studying the behavior of networks close to and far from such “kernel” limits.
 

Video Recording