Abstract

The forces that govern how languages assign meanings to words have been debated for decades. Recently, it has been suggested that human semantic systems are adapted for efficient communication. However, a major question has been left largely unaddressed: how does pressure for efficiency relate to language evolution? Here, we address this open question by grounding the notion of efficiency in a general information-theoretic principle, the Information Bottleneck (IB) principle. Specifically, we argue that languages efficiently encode meanings into words by optimizing the IB tradeoff between the complexity and accuracy of the lexicon. In support of this hypothesis, we first show that color naming across languages is near-optimally efficient in the IB sense. Furthermore, this finding suggests (1) a theoretical explanation for why inconsistent naming and stochastic categories may be efficient; and (2) that languages may evolve under pressure for efficiency, through an annealing-like process that synthesizes continuous and discrete aspects of previous accounts of color category evolution. This process generates quantitative predictions for how color naming systems may change over time. These predictions are directly supported by an analysis of recent data documenting changes over time in the color naming system of a single language. Finally, we show that this information-theoretic approach generalizes to two qualitatively different semantic domains: names for household containers and animal taxonomies. Taken together, these results suggest that efficient coding - a general principle that also applies to lower-level neural representations - may explain to a large extent the structure and evolution of semantic representations across languages.