Videos from Workshops

Playlist: 17 videos

Deep Learning Theory Workshop and Summer School

Videos from Workshops Talks

Remote video URL
1:13:16

Deep Learning in Structural Biology and Protein Design: How, Where, and Why

Chloe Hsu (UC Berkeley)
https://simons.berkeley.edu/talks/tutorial-deep-learning-applications-structural-biology-and-protein-engineering
Deep Learning Theory Workshop and Summer School

Tutorial: Deep Learning Applications in Structural Biology and Protein Engineering

Abstract: There are about 20,000 different proteins in each one of us, humans. These proteins carry out a diverse set of functions to keep us all alive and healthy. Recently, deep learning has been increasingly used to both 1) help us visualize and gain insights into naturally existing proteins and 2) design novel proteins for therapeutic and environmental applications. In this talk, we will take a deep dive into the inner workings of AlphaFold2 and other emerging deep learning methods in structural biology and protein design. We will also examine the assumptions on biological data distributions and discuss hypotheses for the crucial ingredients of successful deep learning applications.
Visit talk page
Remote video URL
1:12:25

When is Scale Enough?

Ethan Dyer (Google Research, Blueshift Team)
https://simons.berkeley.edu/talks/tutorial-emergent-behaviors-deep-learning
Deep Learning Theory Workshop and Summer School

Abstract: Deep learning continues its march of performance progress as models and datasets are scaled up. This talk will discuss work investigating performance predictability with model, dataset, and compute scale for deep learning in general and large language models in particular. I will review scaling in linear models -- a simple analytic system exhibiting many of the phenomena characteristic of realistic networks. I will also discuss empirical work attempting to investigate what types of problems can practically be solved by scale alone and what types cannot.
Visit talk page
Remote video URL
0:57:45

Universality of Approximate Message Passing on Semi-random Matrices

Subhabrata Sen (Harvard)
https://simons.berkeley.edu/node/21940
Deep Learning Theory Workshop and Summer School

Approximate Message Passing (AMP) is a class of efficient iterative algorithms that have been extensively utilized for signal recovery in high-dimensional inference problems. At each iteration, the algorithm involves a vector-matrix product, followed by an application of a non-linear map coordinate-wise to the vector obtained. The main attraction of AMP arises from the fact that the limiting empirical distributions of AMP iterates are gaussian, with means and variances that can be characterized in terms of a low-dimensional recursion known as \emph{state evolution}. These guarantees are usually derived under very specific distributional assumptions on the matrix e.g. iid gaussian entries, or orthogonally invariant matrices. However, numerical investigations indicate that AMP algorithms have a remarkable degree of universality to the data distribution. We will discuss universality of AMP algorithms on a class of \emph{semi-random} matrices, which can be significantly less random than matrices with iid entries. Time permitting, I will discuss the implications for statistical learning problems.

This is based on joint work with Rishabh Dudeja and Yue Lu (Harvard).
Visit talk page
Remote video URL
1:0:6

A Theoretical Framework of Convolutional Kernels on Image Datasets

Song Mei (UC Berkeley)
https://simons.berkeley.edu/node/21939
Deep Learning Theory Workshop and Summer School

Recent empirical work has shown that hierarchical convolutional kernels inspired by convolutional neural networks (CNNs) significantly improve the performance of kernel methods in image classification tasks. A widely accepted explanation for the success of these architectures is that they encode hypothesis classes that are suitable for natural images. However, understanding the precise interplay between approximation and generalization in convolutional architectures remains a challenge.

In this talk, we consider the stylized setting of covariates (image pixels), and fully characterize the RKHS of kernels composed of single layers of convolution, pooling, and downsampling operations. We then study the gain in sample efficiency of kernel methods using these kernels over standard inner-product kernels. In particular, we show that 1) the convolution layer breaks the curse of dimensionality by restricting the RKHS to `local' functions; 2) global average pooling enforces the learned function to be translation invariant; 3) local pooling biases learning towards low-frequency functions. Notably, our results quantify how choosing an architecture adapted to the target function leads to a large improvement in the sample complexity.
Visit talk page
Remote video URL
1:7:31

Tutorial: Methods from Statistical Physics III

Ahmed El Alaoui (Cornell)
https://simons.berkeley.edu/talks/methods-statistical-physics-iii
Deep Learning Theory Workshop and Summer School
Visit talk page
Remote video URL
0:57:15

A New Perspective on High-Dimensional Causal Inference

Pragya Sur (Harvard)
https://simons.berkeley.edu/node/21934
Deep Learning Theory Workshop and Summer School
Visit talk page
Remote video URL
1:0:41

Distribution Shift as Underspecification, and What We Might Do About It

Chelsea Finn (Stanford)
https://simons.berkeley.edu/node/21935
Deep Learning Theory Workshop and Summer School
Visit talk page
Remote video URL
1:6:17

Tutorial: Methods from Statistical Physics II

Ahmed El Alaoui (Cornell)
https://simons.berkeley.edu/talks/methods-statistical-physics-ii
Deep Learning Theory Workshop and Summer School
Visit talk page
Remote video URL
0:58:40

Tutorial: Methods from Statistical Physics I

Ahmed El Alaoui (Cornell)
https://simons.berkeley.edu/talks/methods-statistical-physics-i
Deep Learning Theory Workshop and Summer School
Visit talk page
Remote video URL
0:57:45

Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting

Neil Mallinar (UC San Diego) & Jamie Simon (UC Berkeley)
https://simons.berkeley.edu/node/21931
Deep Learning Theory Workshop and Summer School

The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied benign overfitting, a phenomenon where some interpolating methods approach Bayes optimality, even in the presence of noise. In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks do not fit benignly: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime. We call this intermediate regime tempered overfitting, and we initiate its systematic study. We first explore this phenomenon in the context of kernel (ridge) regression (KR) by obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors. We find that kernels with powerlaw spectra, including Laplace kernels and ReLU neural tangent kernels, exhibit tempered overfitting. We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning.

Joint Work With: Amirhesam Abedsoltan, Parthe Pandit, Mikhail Belkin, Preetum Nakkiran.

Link: https://arxiv.org/abs/2207.06569
Visit talk page