The Initial Search Phase Of Stochastic Gradient Descent For High-Dimensional Inference Tasks

Abstract

Gradient methods, when applied to high-dimensional inference tasks, iteratively optimize a loss function to obtain an estimator of an unknown parameter. The corresponding loss functions are random high-dimensional functions, and are often highly non-convex. Such algorithms can be viewed as having two stages, a search phase in which the algorithm is wandering in a non-convex landscape, and a descent phase in which the landscape is essentially convex and the algorithm converges to a minimum of the population loss. We focus on the online stochastic gradient descent, and study the computational difficulty, i.e. the number of samples used as one varies the dimension, of each of these two stages in a family of high-dimensional inference tasks which includes phase retrieval, online PCA, tensor PCA, and supervised learning for single-layer networks. It turns out that the descent phase is always rapid, and gradient methods all perform well with only a linear number of samples. By contrast, we find that from random (uninformative) initializations, the search phase may take a much larger (though still polynomial) number of samples; thus it is the search phase that governs the sample complexity of gradient methods for these problems. Furthermore, we identify an intrinsic geometric quantity, depending only on the expectation of the loss (i.e. population loss), which dictates the amount of time the algorithm spends in the search phase before reaching the descent phase and thereafter rapidly succeeding at the estimation task. Based on joint work with G. Ben Arous and A. Jagannath.

The Initial Search Phase Of Stochastic Gradient Descent For High-Dimensional Inference Tasks

Abstract

Video Recording