logo
Loading...
Emotions Unveiled: Detecting COVID-19 Fake News on Social Media

Social Work

Emotions Unveiled: Detecting COVID-19 Fake News on Social Media

B. Farhoudinia, S. Ozturkcan, et al.

This groundbreaking research by Bahareh Farhoudinia, Selcen Ozturkcan, and Nihat Kasap uncovers the significant impact of emotions in detecting COVID-19 fake news on social media. By analyzing sentiments linked with fake and real news, the study demonstrates how negative emotions are more prominent in fake news, and highlights improved detection performance through the integration of emotional features in machine learning models.... show more
Introduction

Social media has transformed human life by enabling global connectivity and offering gratifications such as information seeking, entertainment, communication, and opinion expression. Despite these benefits, social media has facilitated the rise of fake news, which threatens public trust, democracy, justice, freedom of expression, and the economy. Notable impacts include the 2016 U.S. election and corporate harms (e.g., a false Pepsi CEO story). During COVID-19, misinformation spread rapidly, causing harmful behaviors and panic. This study, part of a completed PhD thesis, analyzes emotions and sentiments elicited by fake news in the COVID-19 context to explore how emotions can aid fake news detection. The research questions are: (1) How do sentiments associated with real and fake news differ? (2) How do emotions elicited by fake news differ from those elicited by real news? (3) What emotions are most prevalent in fake news? (4) How can these emotions be used to recognize fake news on social media? The paper reviews related studies, details methods, reports results and analyses, discusses limitations, and concludes with implications.

Literature Review

Research on fake news expanded after the 2016 U.S. election and spans multiple disciplines, leading to varied definitions (misinformation vs. disinformation). Fake content can include manipulated images and deepfakes, which are difficult for humans to detect. Cognitive mechanisms such as confirmation bias and reliance on fast, intuitive “system-one” thinking contribute to the belief and spread of fake news, amplified by social media echo chambers. Fake news detection approaches are commonly categorized as: (i) content-based (linguistic/content features), (ii) social context (user/account features and relationships), and (iii) propagation-based (cascade structures). Machine learning (logistic regression, decision trees, random forests, naïve Bayes, SVM) and deep learning (CNN, LSTM) are widely used, with pretrained language models like BERT showing promise. While prior work has explored sentiment’s role in fake news, detailed examination of specific emotions is underexplored. Studies indicate fake news tends toward negative sentiment and that sentiment-aware models can perform better. Reasoning and analytical thinking reduce susceptibility to fake news, implying an emotion–reason tradeoff. In the COVID-19 context, prior research has analyzed public sentiment but gaps remain about characteristics and spread of fake news and the role of specific emotions. This study addresses these gaps by extracting sentiments and eight basic emotions from tweets to classify fake vs. real COVID-19 news.

Methodology

The study employs a multi-step approach: (a) sentiment extraction using lexicons (VADER, TextBlob, SentiWordNet); (b) emotion extraction using the NRC emotion lexicon (eight basic emotions: joy, trust, fear, surprise, sadness, anticipation, anger, disgust); and (c) fake news detection using machine learning (random forest, naïve Bayes, SVM) and a deep learning model (BERT), comparing performance with and without emotion features. Dataset: An open, publicly available dataset of 10,700 English tweets related to COVID-19, labeled as real (5600) or fake (5100), compiled by Patwa et al. (2021) with tweets from August–September 2020. Fake news data were sourced from fact-checking websites (e.g., PolitiFact, Snopes) and social media, with manual verification; real news came from official/verified sources and were human-reviewed for relevance. Preprocessing: Non-alphabetic characters removed, text lowercased, stop words removed, lemmatization applied; text converted to quantitative features using scikit-learn’s ordinal encoder. Sentiment analysis and lexicon selection: Three lexicons (VADER, TextBlob, SentiWordNet) were evaluated via: (1) comparison to a manually labeled subset (positive/negative/neutral), reporting 2-class and 3-class metrics; (2) focusing on positive vs. negative tweets (ignoring neutral) due to asymmetric misclassification costs; and (3) training a random forest fake news classifier using each lexicon’s sentiment outputs (excluding numeric-heavy tweets, about 20% of the dataset) to assess downstream detection power. VADER yielded the best overall performance and was selected as the primary sentiment lexicon. Emotion extraction: Using the NRC lexicon, each tweet received scores for eight emotions; the highest-scoring emotion was assigned as the tweet’s dominant emotion. Features for detection models included tweet text, VADER sentiment, and eight NRC emotion scores. Modeling: Data were split 80% train / 20% test. ML models (random forest, naïve Bayes, SVM) were implemented with default scikit-learn hyperparameters; performance was compared with and without emotion features on non-numerical tweets. Deep learning: A BERT classifier was fine-tuned with tokenization, truncation/padding to max length 128, AdamW optimizer (learning rate 1e-5), and 5-fold CV to select 3 training epochs. Training was performed in Google Colab; evaluation used the held-out test set. Statistical analysis: Two-sample t-tests (Python pingouin) assessed significance of emotion differences between fake and real tweets.

Key Findings
  • VADER outperformed TextBlob and SentiWordNet as a sentiment lexicon for downstream fake news detection using random forest on non-numerical tweets (accuracy: VADER 0.738; TextBlob 0.735; SentiWordNet 0.728). - Sentiment distribution (VADER) by class (percent): Fake—Negative 39.31, Neutral 29.53, Positive 31.15; Real—Negative 35.20, Neutral 18.35, Positive 46.45. Fake news contained more negative and less positive sentiment than real news. - Emotion distribution (NRC) dominant-emotion percentages: Fake—Anger 3.70, Anticipation 4.51, Disgust 1.18, Fear 66.81, Joy 1.42, Sadness 3.80, Surprise 1.20, Trust 17.38; Real—Anger 2.12, Anticipation 9.93, Disgust 0.22, Fear 50.67, Joy 1.50, Sadness 6.38, Surprise 1.58, Trust 27.14. Fear and trust were the most common in both classes; fake news showed higher fear/anger/disgust, real news higher anticipation/surprise/trust. - Emotion intensity (mean scores): Fake—Anger 0.033, Anticipation 0.023, Disgust 0.025, Fear 0.097, Joy 0.028, Sadness 0.064, Surprise 0.018, Trust 0.097; Real—Anger 0.020, Anticipation 0.028, Disgust 0.015, Fear 0.076, Joy 0.027, Sadness 0.064, Surprise 0.022, Trust 0.126. Negative emotion intensities (fear, anger, disgust) were higher in fake news; positive emotions (anticipation, surprise, trust) were stronger in real news. - Statistical significance (two-sample t-tests): Significant differences for fear (p=6.57E-12), anger (p=4.17E-16), trust (p=8.74E-13), surprise (p=0.007362), disgust (p=2.16E-14), anticipation (p=1.86E-39); not significant for sadness (p=0.984772) and joy (p=0.318163). Largest between-class differences: trust (+5.92% in real), fear (−5.33% in real), anticipation (+3.05% in real). - Fake news detection performance (non-numerical tweets): With emotion features—Random forest Acc 0.81, Prec 0.85, Rec 0.94, F1 0.89; Naïve Bayes Acc 0.49, Prec 0.69, Rec 0.53, F1 0.69; SVM Acc 0.76, Prec 0.74, Rec 0.95, F1 0.85. Without emotion features—Random forest Acc 0.79, Prec 0.87, Rec 0.88, F1 0.87; Naïve Bayes Acc 0.66, Prec 0.70, Rec 0.91, F1 0.80; SVM Acc 0.71, Prec 0.71, Rec 0.94, F1 0.83. Emotions improved RF and SVM, but naïve Bayes performed better without emotion features. - BERT performance: With emotion scores—Acc 0.972, Prec 0.983, Rec 0.970, F1 0.976; Without emotion scores—Acc 0.961, Prec 0.981, Rec 0.956, F1 0.967. Adding emotion features improved BERT. - Feature importance (random forest): Anticipation, trust, and fear were the most important emotion features, aligning with observed distribution differences.
Discussion

The study demonstrates that fake news disseminators on social media tend to invoke negative sentiments and emotions more than real news, consistent with negativity bias whereby negative information exerts stronger influence. Real news more frequently elicits positive sentiments and emotions such as anticipation, surprise, and trust. Both frequency and intensity analyses corroborate that fear, anger, and disgust are more pronounced in fake news, while trust, anticipation, and surprise are stronger in real news. Incorporating emotion features improved the performance of random forest, SVM, and BERT fake news detection models, highlighting the utility of emotional signals beyond content and social-context features commonly used in prior work. These findings align with and extend earlier research showing negativity in fake news and more upbeat tones around real news. The work underscores the value of modeling specific emotions, not just polarity, in automated detection systems and provides insight into user engagement dynamics. The implications are interdisciplinary, informing communicators, managers, psychologists, sociologists, and policymakers who seek to mitigate misinformation.

Conclusion

The research proposed and validated novel emotion-based features for fake news detection and examined how sentiments and specific emotions differ between fake and real COVID-19 tweets. VADER emerged as the most effective sentiment lexicon for this task. Fake news contained more negative sentiment and elicited stronger negative emotions (fear, disgust, anger), while real news showed more positive sentiment and stronger positive emotions (anticipation, joy, surprise), with fear and trust being the most prevalent overall. Integrating emotion features into machine learning (random forest, SVM) and deep learning (BERT) models improved detection performance; anticipation, trust, and fear were key differentiators per feature importance analysis. The approach can inform fake news detection in other domains (politics, sports, advertising) and provide broader insights into public emotional states during crises. Future work could enhance models with richer emotion representations and extend analyses beyond COVID-19.

Limitations
  • Temporal and topical scope: The dataset covers August–September 2020 and may not represent the entire pandemic period or other contexts, limiting generalizability. - Emotion assignment method: Each tweet was assigned its highest-scoring emotion; alternative methods (e.g., using full score distributions or intervals) may capture nuances better. - Feature set and models: While emotions improved several models, naïve Bayes performed better without them; additional or pretrained emotion features/models might yield further improvements. - External validity: Findings are based on COVID-19-related English tweets; replication across languages, platforms, and crises is needed. - Future research directions: Experimental field studies to test the hypothesis about emotional differences and evaluations in non-COVID emergency scenarios are suggested.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny