The sparse coding model has been shown to provide a good account of neural response properties at early stages of sensory processing. However, despite several promising efforts [1,2,3] it is still unclear how to exploit the structure in a sparse code for learning higher-order structure at later stages of processing. Here I shall argue that the key lies in understanding how continuous transformations in the signal space are expressed in the elements of a sparse code, and in deriving the proper computations that disentangle these transformations from the underlying invariances. I shall present a new signal representation framework, called the sparse manifold transform, that exploits temporally-persistent structure in the input (similar to slow feature analysis) in order to turn non-linear transformations in the signal space into linear interpolations in a representational embedding space. The SMT thus provides a way to progressively flatten manifolds , allowing higher forms of structure to be learned at each higher stage of processing. The SMT also provides a principled way to derive the pooling layers commonly used in deep networks, and since the transform is approximately invertible, dictionary elements learned at any level in the hierarchy may be directly visualized. Possible neural substrates and mechanisms of SMT shall be discussed.
With Yubei Chen.
 Karklin, Y., & Lewicki, M. S. (2003). Learning higher-order structures in natural images. Network: Computation in Neural Systems, 14(3), 483-499.
 Hosoya, H., & Hyvärinen, A. (2015). A hierarchical statistical model of natural images explains tuning properties in V2. Journal of Neuroscience, 35(29), 10412-10428.
 Le, Q. V. (2012). Building high-level features using large scale unsupervised learning. In Proceedings of the 29th International Conference on Machine Learning.
 DiCarlo, J.J., & Cox, D. D. (2007) Untangling invariant object recognition. Trends in Cognitive Sciences, 11: 333-341.