Political Science
Unveiling evolving nationalistic discourses on social media: a cross-year analysis in pandemic
X. Wu, G. Gu, et al.
Explore the dynamic evolution of nationalistic discourses on social media during the COVID-19 pandemic as revealed by a comprehensive analysis of 2.65 million tweets. Conducted by researchers Xiao-Kun Wu, Gang Gu, Tian-Tian Xie, Tian-Fang Zhao, and Chao Min, this study identifies three distinct frames: 'feeling,' 'identity,' and 'action.' Dive into the intriguing interplay of emotions, identity, and actions in shaping today's online nationalist narratives.
~3 min • Beginner • English
Introduction
The COVID-19 pandemic profoundly altered public health and collective sentiment, with social media (e.g., Twitter, Facebook, Quora) becoming central arenas for sharing experiences and opinions. Prior research has examined negative emotions, country image, racism, nationalism, and hate speech during epidemics and has used diverse computational methods (machine learning, semantic networks). Two key challenges persist: (1) capturing the rapid, evolving nature of online discourse as the pandemic unfolds and (2) integrating computational/quantitative methods with qualitative approaches for interpretation and theory-building. This paper addresses these by proposing an evolving discourse perspective on nationalism and integrating framing theory to categorize discourse into feeling, action, and identity. It develops a mixed-methods pipeline combining transformer-based sentiment/emotion analysis and BERTopic-based topic modeling with a qualitative Evolving Discourse Framework Analysis. The study compares discourse across 2020, 2021, and 2022 to understand how nationalist frames and public sentiment/emotion shifted over time.
Literature Review
The paper situates nationalism as an enduring, multifaceted ideology linked to nation-state formation and identity (Anderson’s ‘imagined community’; Gellner; Tilly; Calhoun). It distinguishes nationalism from racism, patriotism, and populism, noting overlaps and differences. With digital media, ‘imagined nationalism’ is reconfigured online, where the internet acts as a re-embedding technology that can reinforce or fragment identities. Pandemic conditions amplified nationalist and nativist sentiments and anti-Asian racism on social media. Discourse in social platforms reflects democratic interaction but is also shaped by power dynamics, including media and political actors. Sentiment analysis and topic modeling have been widely used to track emotions and themes in COVID-19 tweets, showing early dominance of negative emotions and evolving sentiments over time. Gaps remain in diachronic (evolving) analyses integrating computational techniques with theoretical discourse frameworks to explain how nationalist frames (identity, action, feeling) transform over time.
Methodology
Datasets: (1) China-oriented Twitter datasets compiled for three periods: Mar 23–Jul 03, 2020 (1,042,627 tweets collected; 1,039,566 after cleaning), Mar 18–Apr 15, 2021 (764,030 collected; 763,930 after cleaning), Mar 19–Jul 19, 2022 (1,077,888 collected; 1,048,556 after cleaning). Tweets were retrieved via Twitter API using combined keyword sets: China-related {'china','chinese','P.R.China'} and COVID-19 {'covid-19','covid','covid19','coronavirus','corona','sars cov 2','ncov 2019'}. Fields: Date(GMT), UserId, UserScreenName, UserName, Text, Platform, Type, RetweetCount, FavoriteCount, URL. Duplicates and incomplete items removed. (2) WHO official data (Jan 2020–Sep 2022) on confirmed cases, deaths, recoveries to contextualize temporal patterns.
Sentiment and emotion analysis: Utilized CardiffNLP transformer models tuned for Twitter. Steps: (1) Corpora preparation: use raw tweet text including URLs, emoticons, stop words. (2) Sentiment recognition: classify into positive/neutral/negative with probabilities in [0,1]; English model trained on SemEval-2017; multilingual on UMSAB. (3) Emotion recognition: classify into anger, joy, sadness, optimism using TweetEval (RoBERTa-base) trained on SemEval-2018 Task 1 (Affect in Tweets). (4) Descriptive statistics: take top-scoring label per tweet and compute average label probabilities and distributions overall and for influential tweets (top 1,000 by retweets or favorites).
Topic modeling: BERTopic applied to uncover latent topics in short texts. Steps: (1) Clean text and remove stop words (UCD list). (2) Generate sentence embeddings via SBERT and cluster with HDBSCAN (soft clustering; treat noise as outliers). (3) Represent topics with class-based TF-IDF (c-TF-IDF) to obtain topic-word distributions. For interpretability, 15 topics with 25 n-terms per topic were selected. Inter-topic distance maps and hierarchical clustering (cosine distance) used to visualize topic relations and similarities across years.
Co-occurrence network and qualitative coding: High-frequency word co-occurrence networks were built (nodes=words, edges=co-occurrence within sentences) and visualized in Gephi; graph density and modularity tracked over time. For qualitative analysis, an Evolving Discourse Framework Analysis mapped content into three frames—feeling, action, identity—grounded in nationalism theory. The top 50 most-retweeted tweets per year (n=150) were manually coded by three trained coders with separate coding, comparison, and correction to examine frame-emotion combinations.
Key Findings
- Evolutionary trends: Tweet volume tracked pandemic severity, with fluctuations more synchronized with death increases than with new confirmed cases.
- Sentiment distribution (overall): 2020—Positive 2.9% (31,113), Neutral 66.3% (689,493), Negative 30.6% (318,960); 2021—Positive 5.0% (38,499), Neutral 64.9% (495,936), Negative 30.0% (229,495); 2022—Positive 11.1% (116,857), Neutral 45.8% (480,453), Negative 43.0% (451,246). Neutral declines over time; polarized sentiment rises, with a marked increase in negative sentiment in 2022.
- Influential tweets (top 1,000 retweets/favorites) show stronger negativity: Top retweets—Negative rises 26.0%→24.0%→44.1% (2020→2021→2022); Neutral falls 69.3%→64.7%→45.2%; Positive 4.7%→11.2%→10.7%. Top favorites—Negative 28.7%→23.1%→56.2%; Neutral 65.4%→63.1%→30.9%; Positive 5.8%→13.8%→12.9%. Influential tweets display earlier and steeper polarization than the overall sample.
- Emotion distribution (overall): 2020—Anger 50.7%, Joy 22.7%, Optimism 12.3%, Sadness 14.3%; 2021—Anger 38.7%, Joy 40.2%, Optimism 9.9%, Sadness 11.2%; 2022—Anger 35.7%, Joy 16.8%, Optimism 14.2%, Sadness 33.4%. Anger decreases over time while sadness spikes in 2022; positive emotions (joy, optimism) peak in 2021 then decline.
- Emotions in influential tweets (top 1,000): Combined anger+sadness (retweets) 57.0%→45.3%→70.7%; (favorites) 63.7%→44.3%→71.3%. Influential tweets exhibit larger swings and stronger pessimistic emotions in 2022.
- Topics and similarity: 15 topics (25 n-terms each) selected for interpretability. Hierarchical clustering shows topic differentiation shifts over time; topic similarity increases in 2021 and becomes more uniformly high in 2022, implying reduced topical diversity.
- Co-occurrence networks: Graph density increases and modularity decreases over time (Density: D_2020=0.866, D_2021=0.767, D_2022=0.919; Modularity: M_2020=0.475, M_2021=0.280, M_2022=0.146), indicating a more diverse and fragmented concept landscape in 2022.
- Evolving discourse frames: Feeling frame rises, identity frame declines, action frame remains relatively stable with a slight uptick in 2022. 2020 emphasized nation-state identity and hostility/punishment; 2021 emphasized actions and consequences (e.g., protests, vaccines); 2022 focused on city-level lockdowns and ‘zero-covid’ strategies with heightened emotional language.
- Frame-emotion combinations (coded top retweets): Sadness is prevalent across years; anger highest in 2020 (n=23), drops in 2021 (n=5), resurges in 2022 (n=14). In action frame, focus shifts from programs (2020) to broader concerns (2021) and to programs plus strategies (2022). In feeling frame, sadness peaks in 2021; sympathy increases in 2022. In identity frame, nationality-centric discourse often aligns with anger, especially early in the pandemic.
Discussion
The study demonstrates that nationalist discourse on Twitter evolved across three pandemic years, with growing emotional salience (feeling), waning identity-centric rhetoric, and relatively stable action-oriented discussion. These shifts align with pandemic phases: early-stage identity attributions and hostility, mid-stage action-oriented problem-solving (vaccines, protests), and later-stage localized control measures (lockdowns, zero-covid) accompanied by renewed pessimistic emotions. Influential tweets amplify polarization and pessimistic affect more than the overall corpus, suggesting a disproportionate role of highly visible content in shaping public mood. Topic similarity patterns and co-occurrence network metrics point to a reorganizing discourse space with reduced topical diversity yet more fragmented key concepts by 2022. Together, the quantitative analyses (sentiment, emotion, topics, networks) and the qualitative framing analysis explain how online nationalist narratives interlink feeling, identity, and action: feeling frames both respond to and reshape identity boundaries, which in turn channel into action-oriented rhetoric. These findings advance understanding of online nationalism dynamics during global crises and underscore the importance of integrating computational evidence with theory-driven interpretation.
Conclusion
This work contributes a mixed-methods framework that integrates transformer-based sentiment/emotion analysis, BERTopic, and discourse framing theory to track evolving nationalist discourse during the pandemic. It shows a rising feeling frame, declining identity frame, and stable action frame across 2020–2022; influential tweets exhibit earlier and stronger polarization. High-frequency terms and co-occurrence networks reveal shifting foci from nation-state identity and hostility (2020), to actions and consequences (2021), to city-level measures and zero-covid strategies (2022), with increasing fragmentation of key concepts. Future research should (1) integrate advanced NLP such as self-supervised learning and AIGC models (e.g., ChatGPT) to reduce error rates and enhance interpretability and (2) develop comprehensive mixed-methods frameworks that align computational techniques with social theory, supported by interdisciplinary teams.
Limitations
Two principal limitations: (1) reliance on unsupervised machine learning for sentiment and emotion classification, which can introduce errors in unstructured social media text and is difficult to correct at scale; (2) the mixed-methods approach depends on pre-established social theories (framing and nationalism), which may pose challenges when transferring to other contexts or datasets and requires interdisciplinary expertise.
Related Publications
Explore these studies to deepen your understanding of the subject.

