Abstract

Is mathematics invented or discovered? Recent progress in formulating fundamental principles underlying the stunning success of deep learning (DL) provides a new light on this age-old questions. I will discuss why deep networks seem to escape the curse of dimensionality. The answer lies in a key property of all functions that are efficiently Turing computable: they are compositionally sparse. This property enables the effective use of deep (and sparse) networks — the engine powering deep networks and related systems such as LLMs. It is however difficult to select a "good" decomposition exploiting sparse compositionality: each efficiently Turing computable function admits many sparse decompositions. How then can deep network learn reusable sparse decompositions? One way is to impose a curriculum similar to a chain of
thoughts in which the constituent functions are common across different tasks.