Skip to yearly menu bar Skip to main content

Invited Talk
Workshop: Talking to Strangers: Zero-Shot Emergent Communication

Invited Talk 3: Richard Futrell (UCI) - Information-theoretic models of natural language

Richard Futrell


I claim that human languages can be modeled as information-theoretic codes, that is, systems that maximize information transfer under certain constraints. I argue that the relevant constraints for human language are those involving the cognitive resources used during language production and comprehension and in particular working memory resources. Viewing human language in this way, it is possible to derive and test new quantitative predictions about the statistical, syntactic, and morphemic structure of human languages. I start by reviewing some of the many ways that natural languages differ from optimal codes as studied in information theory. I argue that one distinguishing characteristic of human languages, as opposed to other natural and artificial codes, is a property I call information locality: information about particular aspects of meaning is localized in time within a linguistic utterance. I give evidence for information locality at multiple levels of linguistic structure, including the structure of words and the order of words in sentences. Next, I state a theorem showing that information locality is an inevitable property of any communication system where the encoder and/or decoder are operating under memory constraints. The theorem yields a new, fully formal, and quantifiable definition of information locality, which leads to new predictions about word order and the structure of words across languages. I test these predictions in broad corpus studies of word order in over 50 languages, and in case studies of the order of morphemes within words in two languages.