logo
ResearchBunny Logo
Dependency distance minimization: a diachronic exploration of the effects of sentence length and dependency types

Linguistics and Languages

Dependency distance minimization: a diachronic exploration of the effects of sentence length and dependency types

X. Liu, H. Zhu, et al.

Discover groundbreaking research by Xueying Liu, Haoran Zhu, and Lei Lei, which unravels the intriguing connection between dependency distance, syntactic difficulty, and memory load. This study reveals the fascinating existence of anti-minimization in short sentences, shedding light on how sentence length influences the evolution of dependency distance across different types. Don’t miss this journey into the mechanics of language and cognition!... show more
Introduction

The paper investigates how dependency distance (the linear distance between a governor and its dependent) changes over time as an indicator of syntactic difficulty and working memory load. Prior synchronic studies support a tendency toward dependency distance minimization (DDM) across languages, while diachronic evidence suggests a historical decrease in dependency distance. However, inconsistencies exist regarding short sentences: some work reports DDM even for short sentences (≤10 words), whereas others find anti-DDM in sequences of three or four words. Additionally, different dependency types may impose different cognitive demands and could differentially contribute to diachronic DDM. The study therefore poses three research questions: (1) How is mean dependency distance (MDD) diachronically distributed in short sentences, and does anti-DDM exist? (2) What are the diachronic trends of dependency distance for different dependency types across sentence lengths? (3) Does DDM occur in all dependency types or only certain types?

Literature Review

Dependency distance/length is a robust predictor of syntactic difficulty and memory load, with cross-linguistic and corpus-based studies showing general DDM relative to randomized baselines. While synchronic studies (e.g., Liu 2008; Futrell et al. 2015) show minimized dependency distance across many languages, diachronic work (Lei and Wen 2020) indicates a long-term decline in dependency distance in English political addresses. However, Ferrer-i-Cancho and Gómez-Rodríguez (2021) argue for anti-DDM in very short sequences (3–4 words), potentially due to conflict between DDM and surprisal minimization/predictability maximization. Dependency type also matters: prior studies report varying MDDs across relation types and modalities (e.g., higher for clausal relations than modifiers; differences between spoken and written French). The position of the root (main verb) may play a key role in processing but is often excluded from MDD computations; recent work suggests root position warrants explicit analysis.

Methodology

Metrics: Dependency distance (DD) is computed as the absolute linear distance between the positions of a governor and its dependent in a sentence. Mean dependency distance (MDD) at the sentence level is the average of DDs over all dependencies in a sentence, excluding root and punctuation for comparability with prior work. For dependency-type-level analysis, MDD for a given relation type is the average DD across all instances of that type in the corpus. Root is analyzed separately as a dependency type (position of main verb) given its processing importance. Dataset: State of the Union Addresses (1790–2017) from 43 U.S. presidents (American Presidency Project), totaling 2,012,440 words and 71,155 sentences; single-genre, long time span, and publicly available. Parsing: Stanford CoreNLP 3.9.2 was used to obtain Universal Dependencies; despite some annotation errors, prior work indicates reliability sufficient for trend analysis. Sentence length stratification: sentences were grouped into 0–4, 5–10, 11–20, 21–30, and 31+ words. The fine-grained split of 0–10 into 0–4 and 5–10 targets the anti-DDM question for very short sentences. Dependency relations: 39 types occurred; MDDs were computed for 38 types, excluding punct. Time-series analysis: For yearly MDD series (sentence-level and relation-level) from 1790 to 2017, the Mann-Kendall non-parametric trend test assessed significance, and Theil–Sen’s estimator provided slopes. Analyses used the pyMannKendall Python package. Example computations were illustrated using a parsed sentence showing how DDs of individual relations contribute to sentence-level and relation-level MDD.

Key Findings

Sentence-level trends: (1) MDD increases with sentence length. (2) Very short sentences (0–4 words) show a significant increasing trend in MDD (anti-DDM), while sentences with 5+ words show decreasing MDD over time. Mann–Kendall test results (P, Theil–Sen slope): 0–4 words: Increasing, P=0.000, slope=+0.0018; 5–10 words: Decreasing, P=0.000, slope=−0.0007; 11–20 words: Decreasing, P=0.000, slope=−0.0004; 21–30 words: Decreasing, P=0.004, slope=−0.0003; 31+ words: Decreasing, P=0.000, slope=−0.0013. Dependency-type trends: (1) The count of relation types with decreasing MDD grows with sentence length: 0–4 words: 1 type; 5–10 words: 9 types; 11–20 words: 10 types; 21–30 words: 11 types; 31+ words: 18 types. (2) Root position increases in 0–4-word sentences but decreases in 5+ word sentences over time. (3) Nine dependency types consistently decrease in 5+ word sentences across the period: acl:relcl, aux, auxpass, ccomp, mark, neg, nsubj, nsubjpass, root. (4) Six noun-phrase-related types consistently increase in 5+ word sentences: compound:prt, compound, amod, det, nmod:poss, advmod. (5) In 0–4-word sentences, only case shows a decreasing trend, while root, appos, advmod, and cc increase, contributing to overall anti-DDM in the shortest group.

Discussion

The findings indicate sentence length modulates diachronic DDM: longer sentences exhibit more relation types with decreasing dependency distances, likely due to greater memory pressures from longer dependencies, which incentivize minimization. Although the absolute changes are modest, they are significant given the small typical MDD range and relatively stable English word order over two centuries. In contrast, very short sentences (0–4 words) show anti-DDM, consistent with the predominance of surprisal minimization in contexts where memory load is negligible (within a 4-chunk capacity), thus favoring placements that increase dependency distances. The decreasing trends in root, nsubj/nsubjpass, aux/auxpass, acl:relcl, ccomp, mark, and neg suggest targeted simplifications of structures that tax working memory, including earlier main verb placement and reduced distances in subordinate/embedded constructions. Conversely, increasing MDDs for noun-phrase-related relations align with historical growth in NP complexity (e.g., multi-noun sequences) in informational registers, reflecting communicative pressures to condense information in compact nominal structures.

Conclusion

Diachronically, dependency distance in English shows length-dependent minimization: clear decreases for sentences of five or more words but anti-minimization for very short sentences. Only a subset of dependency relations—nine types—drive the observed DDM, while several NP-related relations increase in distance, likely reflecting rising NP complexity. Overall, the results support the hypothesis that languages tend toward minimized processing costs over time, contributing to syntactic simplification while accommodating informational demands.

Limitations

The analysis is limited to a single genre (political State of the Union Addresses) and a single language (English), constraining generalizability given known genre effects on dependency distance. Future work should test other genres and languages and further probe the role of dependency types across typology and interpreting to assess consistency of the observed trends.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny