Abstract
Training neural networks is a difficult non-convex optimization problem with possibly numerous local optimal and saddle points. However, empirical evidence seems to suggest the effectiveness of simple gradient-based algorithms. In this work, we analyze the properties of stationary points for training one-hidden layer neural networks with ReLU activation functions and show that a stationary point implies a global optimum with high probability under some conditions on the neural weights. Moreover, we introduce semi-random units where the activation pattern is determined by a random projection of the input, and show that networks with these units are guaranteed to converge to global optimal with high probability.