logo
ResearchBunny Logo
The rising entropy of English in the attention economy

Linguistics and Languages

The rising entropy of English in the attention economy

C. Pilgrim, W. Guo, et al.

Dive into groundbreaking research by Charlie Pilgrim, Weisi Guo, and Thomas T. Hills, revealing how the word entropy of American English has steadily increased since 1900! Discover fascinating variations across media types, with short-form media providing a richer landscape of information. This study combines ecological models and consumer behavior insights, explaining why shorter formats are becoming the go-to choice for media consumers.

00:00
00:00
Playback language: English
Introduction
The research question explores the evolution of word entropy in American English and its correlation with the characteristics of different media. The context is the 'attention economy', where media producers compete for limited human attention. The purpose is to investigate whether increasing competition for attention leads to changes in the structure of language, specifically measured by word entropy – a measure of repetition or novelty in word distributions. The importance of this study lies in its potential to reveal fundamental links between linguistic evolution, technological advancements, and human information processing in the digital age. Understanding how the attention economy shapes language can inform our understanding of communication and media consumption. The study proposes that the increased competition for attention in the attention economy is a driving force behind the observed changes in word entropy. This hypothesis is novel because it connects linguistic features to broader socio-technological trends. It departs from traditional linguistic approaches by focusing on the dynamic interplay between information producers and consumers.
Literature Review
The study builds upon Zipf's law, which describes a power law relationship between word frequency and rank. Existing research has suggested that Zipf's law emerges from a balance between the benefits of informative messages (for listeners) and the costs of generating high word entropy text (for speakers). However, this balance has shifted due to changes in modern communication systems. Herbert Simon's concept of 'poverty of attention' highlights the scarcity of attention in the information-rich world, emphasizing the competition for attention in the attention economy. Information foraging theory provides a framework to understand how people search for and consume information, adapting their strategies to different environments. The study extends information foraging theory to model the interplay between producers and consumers in the attention economy.
Methodology
The study analyzed multiple text corpora to investigate the evolution of information across media sources. The Corpus of Historical American English (COHA) provided data from 1810 to the 2000s, categorized into fiction, non-fiction, news, and magazines. The Corpus of Contemporary American English (COCA) and the British National Corpus (BNC) provided more recent data. Social media data was collected from Twitter (now X) and Reddit. Text samples were truncated to 2000 words to control for sample size effects. Data cleaning involved removing headers, tags, sentences with "@" symbols, apostrophes, extra whitespace, and non-text tokens. Information evolution was measured using unigram word entropy, type-token ratio, and Zipf exponent. Time series breakpoint analysis and trend analysis (KPSS and Mann-Kendall tests) were used to examine changes over time in COHA. ANOVA tests were used to compare lexical measures across media categories in COCA, BNC, and COHA. A case study of US magazine circulation was conducted to explore the connection between word entropy and changes in the publishing industry. For social media data, posts were chronologically collated to simulate a user's feed, generating larger text samples.
Key Findings
Analysis of the COHA corpus revealed a clear trend of rising word entropy in American English since approximately 1900. This trend was consistent across all media categories (fiction, non-fiction, news, and magazines), further supported by similar patterns in type-token ratio and Zipf exponent. Time series analysis using KPSS and Mann-Kendall tests confirmed significant upward trends in word entropy, type-token ratio, and Zipf exponent for all categories between 1900 and 2010. Comparison across different media categories using COCA and BNC, in addition to the social media datasets, demonstrated significantly higher word entropy in short-form media (news, magazines) than long-form media (fiction, non-fiction), with social media exhibiting the highest entropy. ANOVA tests across corpora indicated significant differences in word entropy, type-token ratio, and Zipf exponent between media categories. The case study on US magazine circulation suggested a correlation between increasing word entropy and increased competition in the magazine industry. Simulations of the attention economy model supported the observed trends, showing that increased information prevalence leads to higher word entropy and a preference for information-dense short-form media.
Discussion
The findings support the hypothesis that increased competition for attention in the attention economy is linked to rising word entropy in American English. The information foraging model successfully explains both the overall increase in word entropy and the differences across media categories. The model highlights the role of information selection in shaping language, suggesting that humans are, within limits, information rate maximizers. The study contributes to our understanding of linguistic evolution by identifying the attention economy as a significant factor driving changes in language structure. This complements existing theories such as the Linguistic Niche Hypothesis, suggesting that while there might be opposing pressures towards learnability and expressivity, a nuanced analysis could reconcile these dynamics. The model also explains the rise of short-form media as a consequence of efficient information search mechanisms.
Conclusion
The study demonstrates a significant increase in word entropy of American English over time, particularly pronounced in short-form media. The proposed information foraging model in the attention economy provides a compelling explanation for these observed trends. This research connects linguistic evolution with broader socio-technological factors, highlighting the interplay between human behavior and information environments. Future research could explore other measures of entropy, investigate the interaction of various factors attracting human attention (beyond word entropy), and expand the analysis to languages beyond English, potentially examining multimedia content.
Limitations
The model simplifies real-world dynamics, potentially overlooking complex interactions. While steps were taken to mitigate the issue of data source variation affecting the study's conclusions, challenges remain in balancing sample size with data availability. The lexical measures used are unidimensional, not capturing the full information of word distributions. The study focuses on text-based media, neglecting the multimedia landscape. The attention economy itself is a complex system, with factors like optimization algorithms and the news cycle affecting media consumption. The information foraging model simplifies human behavior, not accounting for individual heterogeneity, cultural influences, or other factors influencing information consumption.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny