Efficient Optimization of a Convolutional Network with Gaussian Inputs

Deep learning models are often successfully trained using gradient descentdespite the worst case hardness of the underlying non-convex optimization problem. The key question is then under what conditions can one prove that optimization will succeedHere we provide, for the first time, a result of this kind for a one hidden layer ConvNet with no overlap and ReLU activation. For this architecture we show that learning is hard in the general case, but that when the input distribution is Gaussian, gradient descent converges to the global optimum in polynomial time. I will additionally discuss an alternative approach to sidestepping the complexity of deep learning optimization using improper learning.


All scheduled dates:


No Upcoming activities yet