Zico Kolter (CMU Bosch)
The typical view of deep learning architectures is that they consist of stacked linear operators followed by (typically elementwise) non-linear operators, with minor additions such as residual connections, attention units, etc. However, a great deal of recent work has looked at integrating substantially more structured layers into deep architectures, such as explicit optimization solvers, ODE solvers, physical simulations, and many other examples. In these examples, any differentiable program can serve as a layer in a deep network, and by properly structuring these problems we can encode a great deal of prior knowledge into the system. In this talk, I will highlight the basic approach behind these structured layers, and highlight some recent advances in the area related to incorporating discrete optimization solvers as layers in deep networks. However, despite their potential advantages, these structured layers raise a number of challenges, especially regarding gradient-based training of the systems. I will discuss these challenges and potential ways forward.