The study of language evolution, particularly the cultural evolution of grammar, presents significant challenges due to the lack of direct fossil evidence. This paper addresses this challenge by focusing on a specific grammatical change in English: the replacement of the 'be+PP' perfect construction with the 'have+PP' construction for intransitive verbs. This shift, known as auxiliary selection, is a well-documented phenomenon in historical linguistics. While previous studies have used smaller corpora, this research leverages the immense scale of EEBO, COHA, and Google Books to quantify the dynamics of this change and determine whether natural selection or random drift was the driving force. The investigation of this specific grammatical change provides valuable insights into broader questions of cultural evolution and the mechanisms that shape language diversity. Understanding how such changes occur can illuminate the complex interplay between linguistic factors and broader sociocultural pressures in language development. The use of large-scale data and advanced analytical techniques represents a significant step forward in the study of language evolution, offering a quantitative approach to a traditionally qualitative field.
Literature Review
Prior research on the English perfect has established the historical shift from the prevalent use of 'be+PP' to the dominant use of 'have+PP', particularly with intransitive verbs. Studies using smaller corpora such as YCOE, PPCME2, PPCEME, and PPCMBE noted an increase in 'have+PP' starting in Late Middle English, with a decline in 'be+PP' around the 19th century. These studies identified various factors contributing to the increase in 'have+PP', including linguistic factors (past counterfactual modality, iterative, durative and atelic meanings, telic eventualities) and extralinguistic factors (chronology and text type). However, these analyses were limited by the size of the corpora, preventing a definitive conclusion on whether natural selection or random drift was primarily responsible for the observed change. This study aims to address these limitations by analyzing a much larger dataset and applying more sophisticated statistical methods.
Methodology
This study employed three large-scale datasets: Early English Books Online (EEBO) covering 1473–1700, Corpus of Historical American English (COHA) spanning 1810–2009, and Google Books Ngram data from 1700–2000. The choice of intransitive verbs was crucial to avoid confusion with the passive voice, which also uses 'be+PP'. Target verbs were selected based on frequency in the COCA corpus, ensuring sufficient occurrences in all three datasets. Two groups of verbs were analyzed: Group A (13 high-frequency verbs) and Group B (6 verbs from previous studies). Data processing involved constructing frequency time series for 'be+PP' and 'have+PP' for each verb. To combine data across corpora, a scaling method was applied to the Google Books data, adjusting the relative frequencies based on overlaps with COHA data. The resulting time series were binned to ensure comparable data size per bin, based on the log N binning method. Evolutionary forces were detected using the neural Time Series Classification (TSC) model, a deep neural network trained on Wright-Fisher model simulations. The Frequency Increment Test (FIT) was also applied for comparative purposes. However, post-hoc power analysis demonstrated insufficient data for reliable FIT results.
Key Findings
The analysis of the 13 verbs in Group A revealed a consistent increase in the frequency of 'have+PP' over time, with a significant shift occurring between 1750 and 1800. The neural TSC classified 17 out of 19 verbs (Groups A and B) as exhibiting selection, indicating that directional forces likely drove the 'be+PP' to 'have+PP' transition. Exceptions, such as 'bound' and 'go', were explained by the presence of 'be+PP' usage in passive or adjectival constructions. Similar trends were observed when expanding the analysis to include verbs with a minimum of 30 occurrences in each corpus; the majority (33/36) were classified as selection, with exceptions attributable to similar factors. The results contrast with the previous study by Newberry et al. (2017), which emphasized the role of random drift, highlighting the importance of data scale and the methods used in detecting evolutionary forces. The study provides detailed evolutionary speed, patterns, and dynamics for each verb, supplementing previous aggregated-level observations.
Discussion
The findings strongly suggest that natural selection, rather than random drift, played a significant role in the evolution of the English perfect tense. This conclusion is supported by the neural TSC analysis showing that a substantial majority of the verbs examined underwent a transition from 'be+PP' to 'have+PP' driven by selective forces. While previous research identified potential linguistic and extralinguistic factors influencing this shift, this study offers a quantitative, large-scale confirmation of these changes at the level of individual verbs. This contributes to a more nuanced understanding of language evolution, demonstrating the power of combining large-scale data analysis with advanced statistical methods.
Conclusion
This study confirms and extends previous findings regarding the shift from 'be+PP' to 'have+PP' in the English perfect, providing compelling evidence for the influence of natural selection in this grammatical change. The use of large-scale corpora and a robust neural network model has allowed for a more precise analysis of the evolutionary dynamics involved. Future research should investigate this phenomenon in other languages, explore the relationship between the evolution of the perfect and its functional development (aspect and modality), and investigate the potential role of genre in shaping the relative frequency of these constructions.
Limitations
This study's conclusions are based on the neural TSC, which relies on the Wright-Fisher model as a null hypothesis. Deviations from this model could affect the accuracy of results. The datasets combine British and American English, potentially masking differences in the evolution of this grammatical feature in different dialects. Additionally, the inclusion of a rising proportion of scientific texts in the Google Books corpus might introduce a bias.
Related Publications
Explore these studies to deepen your understanding of the subject.