logo
ResearchBunny Logo
Less than one percent of words would be affected by gender-inclusive language in German press texts

Linguistics and Languages

Less than one percent of words would be affected by gender-inclusive language in German press texts

C. Müller-spitzer, S. Ochs, et al.

This study by Carolin Müller-Spitzer, Samira Ochs, Alexander Koplenig, Jan Oliver Rüdiger, and Sascha Wolfer reveals that making German press texts gender-inclusive requires only minimal textual changes—less than 1%! This finding challenges common beliefs about the readability issues associated with gender-inclusive language.

00:00
00:00
Playback language: English
Introduction
The debate surrounding gender-inclusive language in German often centers on claims that such language makes texts cumbersome, complicated, and longer, potentially hindering comprehension for learners. While previous research indicates that gender-inclusive language doesn't reduce comprehensibility and that adaptation occurs quickly, resistance persists. This study directly addresses the quantitative aspect of this debate by empirically measuring the proportion of text requiring modification for gender inclusivity. To date, this information has been lacking, with existing research focusing more on qualitative aspects or smaller datasets. The study uses manual annotation of German press texts from various sources, focusing specifically on personal nouns, pronouns, and dependent elements within noun phrases, to determine which tokens would need modification for gender inclusivity. The methodology necessitates manual annotation due to the complexities of identifying masculine generics, which cannot be automatically distinguished from masculine specifics based on form alone. This contrasts with prior research predominantly focused on automatically identifiable elements like pronouns in English texts. The study's core question is to quantify the extent of textual material that would need alteration for gender inclusivity in German press texts, providing empirical evidence to counter existing unsubstantiated claims.
Literature Review
The existing literature on gender and language in German reveals a significant gap between psycholinguistic studies demonstrating a male bias in the interpretation of masculine generics and corpus-based studies quantifying the impact of gender-inclusive language on actual text. While some studies have examined the readability of gender-inclusive texts, few have provided quantitative data on the extent of textual alterations necessary for gender inclusivity. Prior research in this area often uses smaller datasets or focuses on automatically detectable linguistic features, limiting the ability to address the complexities of masculine generics, a crucial aspect of gender-inclusive language in German. This study builds upon previous work acknowledging the challenges of automatically identifying generic versus specific masculine forms, highlighting the need for manual annotation to achieve accurate quantification.
Methodology
The study utilizes the German Reference Corpus (DeReKo), sampling texts from four sources: the Deutsche Presseagentur (DPA), and the magazines Brigitte, Zeit Wissen, and Psychologie Heute. DPA texts, due to their volume, objectivity, and recent adoption of more gender-neutral language, form the primary dataset. Control corpora, consisting of longer texts from the other three magazines, are included for comparison. Sampling focuses on the inner 90% of documents based on word count, ensuring representation across text lengths. The final sample consists of 261 texts (184 DPA and 77 from the magazines). Manual annotation is performed by two researchers using a specifically designed scheme focusing on identifying tokens needing alteration for gender-inclusive language. The annotation scheme considers several categories including personal nouns, pronouns, and dependent elements, addressing ambiguities in identifying masculine generics versus specifics and requiring contextual interpretation. The annotation process involved a multi-month refinement period, based on pre-test insights, leading to an 11-category scheme. Inter-annotator agreement was calculated as 77.89%. The analysis includes weighted means with 95% confidence intervals based on the hypergeometric distribution to account for sampling without replacement, considering both the document and overall token counts.
Key Findings
The analysis of the 261 annotated texts reveals that on average, 9.45% of all tokens (9.26%-9.64% confidence interval) represent or are part of person references. Crucially, only 0.95% (0.89%-1.01% confidence interval) of all tokens would require modification for gender-inclusive language. This proportion increases to 9.99% (9.37%-10.63% confidence interval) when considering only tokens with person reference. The DPA corpus demonstrates a smaller proportion of tokens needing change (0.73%, 0.66%-0.81% confidence interval) compared to the control corpus (1.18%, 1.09%-1.29% confidence interval). Analysis across linguistic classes shows that the majority of tokens requiring modification (90.08%) are personal nouns, specifically masculine generics. The average proportion of affected personal nouns is 25% (23.51%-26.54% confidence interval), with a significant number of documents (81) containing no personal nouns requiring modification. Pronouns and dependent elements rarely need modification. The distribution of personal noun types varies significantly between the DPA and control corpora. DPA texts predominantly feature masculine specifics, while the control corpus shows a higher proportion of epicene nouns, masculine generics, and feminized forms. DPA exhibits a strong male bias in person reference (80.37% male, 19.01% female if gender is identifiable from context), whereas the control corpus shows a more balanced representation (45.29% male, 52.87% female). The study further reveals that in 44.44% of the analyzed texts, both masculine generics and specifics are used, with a varied distribution across documents indicating diverse contextual uses of the masculine generic.
Discussion
The findings directly address the central research question, demonstrating that the proportion of text requiring modification for gender inclusivity is substantially lower than frequently asserted. The less than 1% figure across all tokens strongly suggests that the perceived burden of gender-inclusive language is exaggerated. The higher proportion of affected tokens within the subset of person references (9.99%) warrants consideration, but this still implies the majority of references remain unchanged. The identified differences between DPA and control corpora in terms of gender representation highlight the influence of source material and potential topic-specific biases. The low proportion of modified tokens, especially considering the numerous readily available gender-inclusive alternatives, counters claims that such changes significantly impact readability and learnability. The male bias observed in DPA reinforces the importance of gender-inclusive language in promoting balanced representation.
Conclusion
This study provides the first quantitative empirical evidence on the extent of textual alterations needed for gender inclusivity in German press texts. The small percentage of affected tokens challenges prevalent arguments against gender-inclusive language. Future research should explore the cognitive processing differences between gender-inclusive and non-inclusive texts using large language models and examine genre-specific variations in gender representation.
Limitations
The study is limited to press texts and may not generalize to other text types. The manual annotation process, while rigorous, introduces potential subjective bias despite the high inter-annotator agreement. The sampling method, while aiming for representativeness, may not capture the full spectrum of language use.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny