Abstract: Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks. A widely accepted explanation for the success of these architectures is that they encode hypothesis classes that are suitable for natural images. However, understanding the precise interplay between approximation and generalization in convolutional architectures remains a challenge. 

In this talk, we consider the stylized setting of covariates (image pixels), and fully characterize the RKHS of kernels composed of single layers of convolution, pooling, and downsampling operations. We then study the gain in sample efficiency of kernel methods using these kernels over standard inner-product kernels. In particular, we show that 1) the convolution layer breaks the curse of dimensionality by restricting the RKHS to `local' functions; 2) global average pooling enforces the learned function to be translation invariant; 3) local pooling biases learning towards low-frequency functions. Notably, our results quantify how choosing an architecture adapted to the target function leads to a large improvement in the sample complexity.


Video Recording