Introduction
Social media plays a crucial role in contemporary political events, amplifying various narratives while also serving as a platform for crisis communication and information dissemination. This study focuses on language use on social media in Ukraine before and during the Russian invasion. The war, which began on February 24, 2022, caused widespread devastation, resulting in significant civilian casualties, economic damage, and a massive refugee crisis. Given the real-time documentation of the conflict on social media, this presents a unique opportunity to study the impact of the war on language use. Language choice is inherently political, often linked to cultural identity and nation-building. In Ukraine, where many citizens are bilingual in Ukrainian and Russian, language selection reflects complex identities. The Ukrainian government has historically aimed to promote Ukrainian language use, with varying degrees of success. This research investigates the evolution of language use on Ukrainian Twitter (now X) before and during the war, examining long-term trends and the immediate impact of the invasion. Specifically, the study analyzes overall trends in the three main languages (Ukrainian, Russian, English), determines whether observed changes are due to user turnover or behavioral shifts, quantifies the magnitude of both effects, and examines the language shifts of individual users, particularly those switching from Russian to Ukrainian.
Literature Review
Existing research highlights the significance of social media in crisis management and political discourse. Studies have shown its role in amplifying misinformation and polarization while also facilitating communication during crises. The use of language online is intrinsically linked to identity, with multilingual individuals often adapting their language choices based on context and audience. The use of language is also a political act, often central to cultural identity, nation-building, and political change. Previous research on post-Soviet countries reveals attempts to assert native languages through language laws after the dissolution of the USSR. In Ukraine, despite government efforts, a significant portion of the population historically identified as Russian or used Russian as their primary language. Recent studies, using surveys and qualitative analyses of social media posts, show a growing shift toward Ukrainian identity and language use, especially accelerated by the Euromaidan protests and the Russian intervention in Crimea and the Donbas. This study builds upon this existing research, offering a large-scale quantitative analysis of language shift using social media data.
Methodology
This study used data collected from the Twitter API's 1% real-time stream from January 9, 2020, to October 12, 2022. The data included geo-tagged tweets from Ukraine, filtered to exclude retweets and focus on original tweets, quotes, and replies. Data cleaning involved removing duplicate tweets, identifying and removing potential spam bots using a trained bot detection model, and applying additional filtering rules. This resulted in a final dataset of 2,845,670 tweets from 41,696 users. The study employed a two-stage data collection process: using the Twitter's 1% real-time stream and filling in gaps retrospectively using the Twitter Research API. A sensitivity analysis showed that the collection strategy recovered almost all geo-tagged tweets from Ukraine during that time period. The analysis focused on Ukrainian, Russian, and English tweets. A generalized additive mixed model (GAMM) was used to analyze tweeting activity, modeling the number of tweets per user per language per week using a Poisson distribution. This model disentangled sample effects (changes in the population of active users) from behavioral changes (changes in tweeting patterns of active users). A similar GAMM was used to analyze language choice, modeling pairwise language probabilities (e.g., probability of tweeting in Ukrainian over Russian) using a binomial distribution. The models incorporated smooth global time trends and user-specific random effects. Effect sizes were calculated as changes in expected tweeting activity and odds ratios for the language choices, controlling for sample effects and behavioral changes. A multilingual topic modeling using BERTopic was also conducted to analyze tweet content, focusing on topics related to the war.
Key Findings
Descriptive analysis revealed that Ukrainian and Russian were the most prevalent languages in the dataset, with a clear upward trend in Ukrainian tweets and a downward trend in Russian tweets over time. The outbreak of the war caused a sharp increase in tweets in all three languages. Analyzing weekly user activity showed a decrease in active users until the war's outbreak, followed by an increase. Examination of tweeting activity per user revealed a steady decrease in Russian tweets and a slow increase in Ukrainian tweets over time, with a sharp increase in Ukrainian tweets following the war's outbreak. The GAMM analysis disentangled sample and behavioral effects. Sample effects showed a decline in Russian tweeting intensity and a sharp increase in Ukrainian tweeting intensity following the war's outbreak, due to changes in the active user population. Behavioral effects showed a long-term decrease in Russian tweeting activity and a steady increase in Ukrainian tweeting activity, significantly accelerating after the war's outbreak. Analysis of language choice revealed a consistent increase in the probability of tweeting in Ukrainian over Russian, greatly accelerating after the war. The GAMM analysis showed that this shift was predominantly driven by behavioral changes. A significant portion of users who primarily tweeted in Russian before the war switched to primarily tweeting in Ukrainian afterward. Further analysis of the users who switched languages showed that a significant fraction of those switching from Russian to Ukrainian demonstrated increased engagement on Twitter and larger follower bases. Topic modeling revealed that users switching languages were more likely to discuss the war in their tweets, although this effect was not significant when controlling for the overall number of tweets.
Discussion
The findings address the research question by demonstrating a substantial shift in language use on Ukrainian social media, particularly from Russian to Ukrainian, directly related to the Russian invasion of Ukraine. The increase in Ukrainian language use reflects more than just a change in the user population; it represents a significant behavioral shift driven by the war. The findings support the hypothesis that language choice is a conscious act of self-expression and identity formation, highlighting the political nature of language and its role in expressing national identity during times of conflict. The significance of these results lies in their demonstration of a large-scale, observable shift in language use in response to a major geopolitical event, contributing to our understanding of language dynamics and identity in the digital age. The observed shift is consistent with other research on post-Soviet language dynamics and provides valuable insights into the impact of geopolitical conflict on language attitudes and practices. The large-scale, longitudinal design of the study allows for strong causal inferences, surpassing limitations of previous research relying on smaller samples and shorter time frames.
Conclusion
This study provides compelling evidence of a significant shift in language use on Ukrainian social media in response to the Russian invasion. The findings reveal a substantial increase in Ukrainian and a decrease in Russian usage, primarily driven by a conscious behavioral change among users. This highlights the intertwined nature of language, identity, and political expression in online spaces. Future research could investigate the content and sentiment of tweets to gain a deeper understanding of the motivations behind language shifts, explore the role of social networks in influencing language choices, and analyze similar language dynamics in other geopolitical contexts.
Limitations
The study acknowledges several limitations. The sample of Twitter users is not representative of the entire Ukrainian population, potentially being skewed towards younger demographics and those with greater online engagement. Geo-information is not universally included on Twitter, which might further bias the sample. The study cannot track users who create new accounts, potentially underestimating behavioral effects. Users may stop tweeting for various reasons, such as fleeing the country, potentially introducing a selection bias in the analysis of language shifts occurring after the outbreak of war. Future studies could address these limitations through a more diverse sampling strategy and the inclusion of data from other social media platforms.
Related Publications
Explore these studies to deepen your understanding of the subject.