![Bridging Continuous and Discrete Optimization_hi-res logo](/sites/default/files/styles/workshop_banner_sm_1x/public/2023-01/Bridging%20Continuous%20and%20Discrete%20Optimization_hi-res.png.jpg?itok=b7fmT0eV)
Abstract
Much empirical work in deep learning has gone into avoiding vanishing gradients, a necessary condition for the success of stochastic gradient methods. This raises the question of whether we can provably rule out vanishing gradients for some expressive model architectures? I will point out several obstacles, as well as positive results for some simplified architectures, specifically, linearized residual networks, and linear dynamical systems.
Based on joint works with Ma and Recht.