This study applies an information-theoretic approach, Kolmogorov complexity, to measure Chinese linguistic complexity. A corpus of approximately 60 million characters was used to calculate morphological, syntactical, and overall Kolmogorov complexity metrics, along with 18 other existing metrics. Results show significant correlations between the Kolmogorov metrics and the existing metrics, indicating reliability. Comparisons with nine European languages and Chinese L1/L2 speakers of varying proficiencies demonstrate the validity of the Kolmogorov approach in capturing key linguistic features of Chinese, such as morpheme richness and topic prominence.
Publisher
Humanities and Social Sciences Communications
Published On
Jul 30, 2024
Authors
Xun Liu, Feng Li, Wei Xiao
Tags
Kolmogorov complexity
Chinese linguistics
morphological metrics
syntactical metrics
language comparison
language proficiency
linguistic features
Related Publications
Explore these studies to deepen your understanding of the subject.