Recent years have seen astounding progress both in theory and practice of nonconvex optimization. Carefully designed nonconvex procedures simultaneously achieve optimal statistical accuracy and computational efficiency for many problems. Due to the highly nonconvex landscape, the state-of-the-art results often require proper regularization procedures (e.g. trimming, projection, or extra penalization) to guarantee fast convergence. For vanilla algorithms, however, the prior theory usually suggests conservative step sizes in order to avoid overshooting.
This talk uncovers a striking phenomenon: even in the absence of explicit regularization, nonconvex gradient descent enforces proper regularization automatically and implicitly under a large family of statistical models. In fact, the vanilla nonconvex procedure follows a trajectory that always falls within a region with nice geometry. This "implicit regularization" feature allows the algorithm to proceed in a far more aggressive fashion without overshooting, which in turn enables faster convergence. We will discuss several concrete fundamental problems including phase retrieval, matrix completion, blind deconvolution, and recovering structured probability matrices, which might shed light on the effectiveness of nonconvex optimization for solving more general structured recovery problems.