Abstract

Contrary to the scientific computing community which has, wholeheartedly, embraced the second-order optimization algorithms, the machine learning (ML) community has long nurtured a distaste for such methods, in favour of first-order alternatives. When implemented naively, however, second-order methods are clearly not computationally competitive. This, in turn, has unfortunately lead to the conventional wisdom that these methods are not appropriate for large-scale ML applications. In this series of talks, we will provide an overview of various second-order optimization methods and their stochastic variants. We will demonstrate the theoretical properties as well as empirical performance of a variety of efficient Newton-type algorithms for both convex and non-convex problems. In the process, we will highlight the disadvantages of first-order methods and, in their light, showcase the practical advantages offered by appropriate application of second-order information.

Video Recording