Abstract
It is known that arbitrary poly-size neural networks trained by GD/SGD can learn in class in SQ/PAC. This is however not expected to hold for more regular architectures and initializations. Recently, the staircase property emerged as a condition that seems both necessary and sufficient for certain regular networks to learn with high accuracy, with the positive result established for sparse homogeneous initializations. In this talk, we show that standard two-layer architectures can also learn staircases with features being learned over time. It is also shown that kernels cannot learn staircases of growing degree. Joint work with Enric Boix-Adsera and Theodor Misiakiewicz.