Abstract
Optimization of high-dimensional non-convex models has always been a difficult and fascinating problem. Since our minds tend to apply notions that we experienced and naturally learned in low-dimension, our intuition is often led astray.
Those problems appear naturally and become more and more relevant, in particular in an era where an increasingly large amount of data is available. Most of the information that we receive is useless and identifying what is relevant is an intricate problem.
Machine learning problems and inference problems often fall in these settings.
In both cases we have a cost function that depends on a large number of parameters that should be optimized. A rather simple, but common, choice is the use of gradient-based algorithms, that descend in the cost function trying to find good solutions.
If the cost function is convex, then under mild conditions on the descent rate, we are guaranteed to find the good solution. However we often do not have convex costs.
In the talk: We characterize the dynamics of gradient descent and Langevin dynamics identifying the algorithmic thresholds. We analyze the structure of the landscape and find the counter-intuitive result that in general an exponential number of spurious solutions do not prevent vanilla gradient descent initialized randomly to find the only good solution. We find a quantitatively and qualitatively explanation of the underlying phenomenon.
Refs.
* Thresholds of descending algorithms in inference problems.SSM, Zdeborova. J.Stat.Mech., 2020(3):034004;
* Marvels and pitfalls of the Langevin algorithm in noisy high-dimensional inference.SSM, Biroli, Cammarota, Krzakala, Urbani, Zdeborova. PRX 10, 011057;
* Who is afraid of big bad minima? analysis of gradient-flow in spiked matrix-tensor models.SSM, Biroli, Cammarota, Krzakala, Urbani, Zdeborova. NeurIPS'19;
* Passed&Spurious: Descent algorithms and local minima in spiked matrix-tensor models.SSM, Krzakala, Urbani, Zdeborova. ICML'19.