Abstract

A recent line of work studies overparametrized neural networks in the “kernel” regime, i.e. when during training the network behaves as a kernelized linear predictor, and thus training with gradient descent boils down to minimizing an RKHS norm. This stands in contrast to other studies that demonstrated how gradient descent on overparametrized multilayer networks can induce rich implicit biases that are very different from RKHS norms. In this talk, I will overview recent results to illustrate the reasons for these seeming contradictions. The goal of the talk would be to partially resolve the conditions under which training overparameterized models using gradient descent exhibit “kernel” like behavior vs “deep” behavior leading to models different from minimum RKHS norm models.

Video Recording