Abstract

Abstract: Deep learning continues its march of performance progress as models and datasets are scaled up. This talk will discuss work investigating performance predictability with model, dataset, and compute scale for deep learning in general and large language models in particular. I will review scaling in linear models -- a simple analytic system exhibiting many of the phenomena characteristic of realistic networks. I will also discuss empirical work attempting to investigate what types of problems can practically be solved by scale alone and what types cannot.

Video Recording