Abstract

I will introduce the in-context learning capability of large language models, the ability to learn to solve a downstream task simply by conditioning on a prompt consisting of input-output examples without any parameter updates. I will present a few papers that aim to theoretically explain the mechanisms of in-context learning on simplified data distributions.