Abstract
In-context learning refers to the ability of a model to learn new tasks from a sequence of input-output pairs given in a prompt. Crucially, this learning happens at inference time without any parameter updates to the model. I will discuss our empirical efforts that shed light on some basic aspects of in-context learning: To what extent can Transformers, or other models such as LSTMs be efficiently trained to in-context learn fundamental function classes, such as linear models, sparse linear models, and small decision trees? How can one evaluate in-context learning algorithms? And what are the qualitative differences between these architectures with respect to their ability to be trained to perform in-context learning? I will also discuss subsequent work of other researchers which illuminates connections between language modeling and learning: must a good language model be able to perform in-context learning? Do large language models know how to perform regression? And are such primitives useful for language-centric tasks? This talk will be mostly based on joint work with Shivam Garg, Dimitris Tsipras, and Percy Liang.