Social Work

The fingerprints of misinformation: how deceptive content differs from reliable sources in terms of cognitive effort and appeal to emotions

C. Carrasco-farré

This exciting research conducted by Carlos Carrasco-Farré examines the intriguing differences between misinformation and factual news across a vast collection of articles. Discover how misinformation captivates with its emotional appeal and simplicity, shedding light on the importance of understanding its varied characteristics.

00:00

~3 min • Beginner • English

Index

Introduction

The study asks how to differentiate and mitigate various forms of misinformation by identifying consistent signals in the text itself. It frames the challenge in terms of volume, breadth, and speed: deceptive content is abundant, manifests in multiple forms (e.g., clickbait, conspiracy, fake news, hate speech, junk science, rumors), and spreads faster than factual content. The paper proposes four fingerprints to distinguish misinformation from factual news: (1) emotional evocation (sentiment polarity and appeal to morality/social identity) and (2) cognitive effort required (readability via grammatical features and lexical diversity via perplexity). The goals are to enable scalable, on-the-spot, and explainable detection that recognizes heterogeneity across misinformation categories, moving beyond binary factual vs. fake classifications.

Literature Review

Prior work in communication, psychology, and computational linguistics suggests deceptive content differs in style and emotionality. The information manipulation and four-factor theories propose systematic stylistic and arousal differences in deceptive discourse. The limited capacity model indicates lower cognitive effort can increase engagement and virality. Emotional content, especially high-arousal and negative affect, tends to spread more widely on social media, aligning with dual-process theories where emotional (vs. analytic) thinking increases susceptibility to misinformation. Studies also highlight the role of moral-emotional language in amplifying online content through social identity and self-categorization mechanisms. Existing misinformation research often focuses on fake vs. factual news and sentiment, with less attention to lexical/grammatical effort and moral language across multiple misinformation types.

Methodology

Data: The study uses the Fake News Corpus (9.4M items from 194 domains) with sources labeled via the OpenSources project into categories: clickbait, conspiracy theory, fake news, hate news, junk science, reliable (factual), and rumor. From these, 30,000 random articles were initially drawn per misinformation category (210,000 items). For factual news, 3,177 articles were scraped from The New York Times, The Wall Street Journal, and The Guardian. Articles were filtered to those between 800 and 2,000 words and outliers were removed using 2.5th and 97.5th percentiles across computed measures, yielding a final dataset of 92,112 articles distributed across clickbait (12,955), conspiracy (15,493), fake news (16,158), hate news (15,353), junk science (16,252), factual (17,413), and rumor (14,158). Features (fingerprints): - Cognitive effort via grammatical features: Flesch–Kincaid readability score (proxy for syntactic and lexical difficulty through sentences, words, and syllables). - Cognitive effort via lexical features: lexical diversity captured through entropy/perplexity (lower perplexity indicates more predictable, less diverse text; higher perplexity indicates more diverse/unpredictable language). - Emotionality via polarity: sentiment analysis using the AFINN lexicon (scores from −5 to +5 aggregated over text), capturing negative vs. positive sentiment. - Emotionality via social identity/morality: counts of moral foundation words (Moral Foundations Dictionary), normalized as moral words per 500 words and interacted with negativity to adjust for length and negativity effects. Analyses: - Similarity and clustering: Euclidean distance on feature space; hierarchical clustering (UPGMA) and k-means with elbow method to identify clusters and proximities among categories. - Classification differences: Multinomial logistic regression with factual news as the reference category; predictors included readability, perplexity, sentiment, and morality measures. Predicted probabilities and first differences were obtained via simulation across observed ranges for each variable, holding others at means (100 simulations, 95% intervals).

Key Findings

- Overall differences: Relative to factual news, misinformation is on average 3% easier to read, 15% less lexically diverse, relies about 10 times more on negative sentiment, and uses about 37% more moral language. - Clustering/similarity: Two broad clusters emerge. Cluster 1: rumors, hate speech, conspiracy theories, and fake news; Cluster 2: factual news, clickbait, and junk science. Factual news is the most distinct category at lower clustering heights. Conspiracy theories and fake news are the most similar pair; clickbait and junk science are the closest to factual news among misinformation categories. Reported Euclidean distances from factual: rumor 37.68, fake news 35.69, hate 34.84, conspiracy 32.43, clickbait 21.71, junk science 12.00. - Readability (multinomial logit): Higher readability scores decrease the odds of clickbait, conspiracy, fake news, and rumor vs. factual (e.g., β_readability ≈ −0.06 clickbait; −0.05 conspiracy; −0.21 fake; −0.04 rumor; all p < 0.001). Readability slightly increases odds for hate speech and junk science. - Perplexity (lexical diversity): Across misinformation categories, higher perplexity reduces the odds of being misinformation vs. factual (e.g., β_perplexity ≈ −0.03 to −0.045; mostly p < 0.001), indicating misinformation tends to be less lexically diverse/more predictable. - Sentiment: More negative sentiment increases odds of clickbait, conspiracy, fake news, and hate speech vs. factual; more positive sentiment is associated with junk science and rumor. - Morality: Greater appeal to moral language increases odds of being misinformation across categories (e.g., positive coefficients around 0.16–0.22), highlighting moral-emotional framing as a core characteristic of deceptive content. - Simulations/first differences: Increasing perplexity notably raises the probability of junk science and lowers probabilities for conspiracy, fake, hate, and rumor; shifts toward negative sentiment raise probabilities for clickbait, conspiracy, fake, and hate, while highly positive sentiment favors junk science and rumor. Increasing moral language generally raises the likelihood of being classified as misinformation and reduces the likelihood of being factual.

Discussion

The findings demonstrate systematic, measurable textual differences between factual news and multiple misinformation types, addressing the breadth problem by moving beyond binary fake–factual distinctions. Lower cognitive effort (simpler readability and reduced lexical diversity) and stronger emotional/moral framing characterize misinformation, aligning with theories of engagement and emotional contagion. The fingerprints enable scalable, explainable, and near real-time triage of content independent of network dynamics, helping platforms and fact-checkers prioritize items before virality (addressing volume and speed). They also inform media literacy and public-interest algorithm design by highlighting the heightened role of negative sentiment and moral language in deceptive content and the relative similarity of certain categories (e.g., clickbait, junk science) to factual news. Recognizing category-specific differences can guide targeted interventions and policy responses.

Conclusion

This paper introduces a practical, theory-grounded set of textual fingerprints—readability, lexical diversity (perplexity), sentiment, and morality—to distinguish factual news from several misinformation categories at scale and in real time. Empirically, misinformation is easier to process, less lexically diverse, more negative, and more moral-emotion laden than factual news, with substantial heterogeneity across categories. These results support explainable, early detection and prioritization workflows for platforms, media, and fact-checkers, and inform media literacy strategies focused on emotional and moral cues. Future research should examine temporal evolution of these fingerprints, potential adversarial adaptation, whether misinformation increasingly mimics factual linguistic signals, and how traditional media’s language evolves in digital competition.

Limitations

- Source labeling and selection: Categories derive from domain-level crowdsourced labels; results may reflect domain-specific features and may not capture within-domain heterogeneity. - Feature/dictionary dependence: Measures rely on specific readability formulas and lexicons (AFINN, Moral Foundations); alternative resources could yield different sensitivities. - Model simplicity vs. performance: The focus on interpretability may underperform compared to more complex models; results are intended as an explainable baseline rather than the most predictive system. - Aggregation: Analyses target category-level averages; individual sources within categories can differ in tone and moral/emotional strategies. - Sampling/filters: Word-count constraints and outlier trimming may affect generalizability beyond the specified text lengths and distributions.

Related Publications

Explore these studies to deepen your understanding of the subject.

Education

Time and cognitive development: from Vygotsky's thinking to different notions of disability in the school environment

M. Ferreira, O. L. D. S. Filho, et al.

Biology

The Synergic Effect of AT(N) Profiles and Depression on the Risk of Conversion to Dementia in Patients with Mild Cognitive Impairment

M. Marquié, F. García-gutiérrez, et al.

Health and Fitness

The role of lifestyle and non-modifiable risk factors in the development of metabolic disturbances from childhood to adolescence

C. Börnhorst, P. Russo, et al.

Medicine and Health

From SARS to COVID-19: the role of experience and experts in Hong Kong's initial policy response to an emerging pandemic

K. Matus, N. Sharif, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny