Abstract: In the first tutorial, we review tools from classical statistical learning theory that are useful for understanding the generalization performance of deep neural networks. We describe uniform laws of large numbers and how they depend upon the complexity of the class of functions that is of interest. We focus on one particular complexity measure, Rademacher complexity, and upper bounds for this complexity in deep ReLU networks. We examine how the behaviors of modern neural networks appear to conflict with the intuition developed in the classical setting.
In the second tutorial, we review approaches for understanding neural network training from an optimization perspective. We review the classical analysis of gradient descent on convex and smooth objectives. We describe the Polyak--Lojasiewicz (PL) inequality and discuss how to interpret such an inequality in the context of neural network training. We describe a particular regime of neural network training that is well-approximated by kernel methods, known as the neural tangent kernel (NTK) regime. We show how to establish a PL inequality for neural networks using two approaches: a general approach based on the NTK approximation, the other in the particular setting of linearly-separable data.