Abstract

Given pairs (x_i,y_i) of samples x_i and (vector-valued) labels y_i drawn from some distribution, a fundamental problems in machine learning is to train a neural network to correctly classify the data. This problem can be framed as a learning problem. Namely, given pairs (x_i,y_i) with the promise that the samples are classified by some ground-truth neural network M(x), one can attempt to learn the weights of this network. In this talk, we focus on two-layer networks M(x) with a single hidden layer containing rectified (e.g. ReLU) activation units f( ). Such a network is specified by weight matrices U,V, so that M(x) = U f(V x), and f is applied coordinate-wise. More generally, the observations y_i may be corrupted by noise e_i, and instead we observe M(x_i) + e_i, and the goal is to learn the matrices U,V. In this talk, we discuss state of the art learning algorithms and hardness results under varying assumptions on the input and noise. Our central result is a polynomial time algorithm for Gaussian inputs and Sub-gaussian noise. In addition, we discuss a poly-time exact recovery algorithm for the noiseless case, and fixed-parameter tractable algorithms for more general noise distributions.

Based on a joint work, partly carried out at the Simons Institute, with Ainesh Bakshi and David Woodruff [arXiv:1811.01885].