Abstract

As the scale and complexity of machine learning algorithms continue to increase, methods for fast and efficient optimization are becoming increasingly important. Gradient methods have proven to be central both in theory and in practice for a wide range of machine learning tasks. A thorough understanding of generalization performance of iterative algorithms is as crucial as those for their convergence. Algorithmic stability is a powerful tool for analyzing generalization performance of learning algorithms, leading to better bounds than classical uniform convergence analysis in many cases.

In this talk, we propose a new stability-based framework for deriving convergence lower bounds for a general class of iterative optimization algorithms. In particular, we characterize the trade-off between algorithmic stability and optimal convergence rate of iterative methods through information theoretic arguments. And it is shown that stable algorithms can not converge too fast. Further, we derive algorithmic stability bounds in the convex and smooth setting for many first order methods such as gradient descent and momentum methods. Applying this framework, we translate these stability bounds into lower bounds for convergence, of which some are believed to be new. These lower bounds match the well established upper bounds up to constants. Finally, the stability bounds observed in several numerical experiments agree with our theoretical findings. (This talk is based on joint work with Yuansi Chen (Statistics, UCB) and Chi Jin (CS, UCB)).

Video Recording