Abstract

In this talk, I'll describe the "data gap" between LLMs and humans (Frank, 2023, TiCS): that LLMs are trained on 3-5 orders of magnitude more data than human children receive. I'll review some viewpoints on why this gap exists, including 1) innate knowledge, 2) active and social learning, 3) multimodal information, and 4) evaluation differences. While I can't decide this issue, I'll provide some new data on the richness of multimodal input and the consequences of evaluation differences. In particular, I'll discuss how the cognitive science idea of competence / performance distinctions plays out in LLMs.