logo
Loading...
Accuracy prompts are a replicable and generalizable approach for reducing the spread of misinformation

Psychology

Accuracy prompts are a replicable and generalizable approach for reducing the spread of misinformation

G. Pennycook and D. G. Rand

Discover how shifting users' attention toward accuracy can significantly decrease the sharing of misinformation online! This research, conducted by Gordon Pennycook and David G. Rand, analyzes 20 experiments with over 26,000 participants, revealing that accuracy prompts enhance sharing discernment, particularly among older and more reflective individuals.... show more
Introduction

The paper addresses the widespread concern about online misinformation and evaluates a proactive intervention: prompting users to think about accuracy before sharing. Prior work shows a disconnect between perceived accuracy and sharing behavior; people can discern true from false when asked about accuracy, yet this discernment often does not translate into sharing decisions. The authors hypothesize that inattention to accuracy contributes to misinformation sharing and that brief accuracy prompts can improve the quality of content people choose to share. The study’s purpose is to assess the replicability and generalizability of the accuracy prompt effect across implementations, topics (politics and COVID-19), platforms (MTurk, Lucid, YouGov), and user characteristics, to inform both theory and potential deployment by platforms and policymakers.

Literature Review

Past research indicates exposure to misinformation can increase belief, and that users’ sharing intentions are often misaligned with their ability to judge accuracy. Studies have shown that accuracy prompts (e.g., asking participants to rate the accuracy of a single neutral headline) can increase sharing discernment, with evidence from lab surveys and a large Twitter field experiment. Questions remained about whether prompts primarily reduce sharing of false content or increase sharing of true content, whether effects differ by political ideology, attentiveness, or decay rapidly. Various prompt formats have been explored, including evaluation, importance of accuracy, descriptive norms, PSA-style videos, reasoned vs emotional framing, and digital literacy tips. The present internal meta-analysis seeks to systematically evaluate these unresolved issues across numerous studies conducted by the authors’ group.

Methodology

Design: Internal meta-analysis of 20 experiments conducted from 2017 to 2020 with U.S. participants (total N = 26,863). All were online survey experiments where participants who use social media were randomized to receive an accuracy prompt (various formats) or control before indicating sharing intentions for a series of fact-checker-verified true and false news headlines presented in a Facebook-like format. Topics included politics and COVID-19; subject pools were MTurk convenience samples and more nationally representative samples from Lucid and YouGov. Accuracy prompt variants (per Table 1): Evaluation (rate accuracy of a neutral headline; sometimes 10 headlines; sometimes with feedback), Importance (self-report importance of sharing only accurate news), Norms (others value sharing accuracy), PSA video (30-second reminder to think about accuracy), Reason (importance of reasoned vs emotional sharing), Tips (minimal digital literacy tips). Some conditions combined prompts (e.g., Evaluation + Norms/Importance/Reason; or repeated Evaluations with/without feedback). Inclusion/exclusion: Included studies (a) run 2017–2020, (b) U.S. subjects, (c) online surveys, (d) sharing intentions for true/false headlines with fact-checker ground truth, (e) randomized accuracy prompt vs control administered prior to sharing task. Excluded studies rated accuracy on every item before sharing (full attention treatments), non-survey/pilot platform studies, non-U.S. samples, or lacking fact-checker ground truth. Ethics approval obtained; informed consent collected. Measures: Sharing intentions typically on 6-point Likert scales rescaled to [0,1]. Additional measures collected in subsets: political ideology and partisanship, age, gender, race, education, Cognitive Reflection Test (CRT), self-reported valuing of accuracy for sharing, and attention checks. Analysis per study: Rating-level linear regressions with robust SEs clustered on participant and headline, predicting sharing from headline veracity (0=false, 1=true), treatment condition (0=control, 1=prompt), and their interaction. Coefficients interpreted as: interaction = effect on sharing discernment; condition main effect = effect on sharing of false headlines; veracity main effect = baseline sharing discernment in control. Moderation models added individual differences and relevant interactions; order-effect models added trial number and interactions. Meta-analysis: For each coefficient, study-level estimates were combined using random-effects meta-analysis. Meta-regressions examined study-level moderators (subject pool, topic, baseline control discernment). Item-level analyses correlated headline-specific treatment effects on sharing with out-of-sample perceived accuracy ratings (from pretests or parallel conditions).

Key Findings
  • Overall effect on sharing discernment: Accuracy prompts significantly increased sharing discernment (interaction b = 0.038, z = 7.102, p < 0.001), a 71.7% increase over baseline discernment in control (veracity b = 0.053, z = 6.636, p < 0.001). Random-effects meta-analysis across studies showed significant heterogeneity: Q(19) = 88.53, p < 0.001, I² = 78.5% (k = 20).
  • False vs true sharing: Effects driven by reduced sharing of false headlines (treatment main effect b = −0.034, z = 7.851, p < 0.001), about a 10% decrease relative to control baseline for false (intercept b = 0.341, z = 15.695, p < 0.001). No significant effect for true headlines (b = 0.006, z = 1.44, p = 0.150). Meta-analytic forest plots: false sharing overall DL = −0.03 (−0.04, −0.03), with low heterogeneity Q(19) = 23.33, p = 0.223, I² = 18.5%; true sharing overall DL = 0.01 (−0.00, 0.01), Q(19) = 22.42, p = 0.264, I² = 15.3%.
  • Evaluation-only prompt (k = 14): Increased discernment (b = 0.034, z = 7.823, p < 0.001; +59.6% over baseline), reduced false sharing (b = −0.027, z = −5.548, p < 0.001; −8.2% vs baseline intercept b = 0.330), no significant effect on true sharing (b = 0.009, z = 1.89, p = 0.059).
  • Stacking prompts (Evaluation combined with others or repeated): Larger discernment gains (b = 0.054, z = 2.765, p = 0.006; +100.8% over baseline), larger reductions in false sharing (b = −0.048, z = −2.990, p = 0.003; −16.5% vs baseline intercept b = 0.291), no significant effect on true sharing (b = 0.008, z = 0.775, p = 0.438).
  • Non-evaluation prompts also effective: Increased discernment (b = 0.039, z = 4.974, p < 0.001; +70.9% over baseline), reduced false sharing (b = −0.039, z = −5.161, p < 0.001; −11.0% vs baseline intercept b = 0.356), no effect on true sharing (b = 0.002, z = 0.338, p = 0.735).
  • Item-level mechanism: Treatment effect magnitude for a headline strongly correlated with its perceived accuracy (meta-analytic r = 0.773, z = 19.417, p < 0.001) with no significant heterogeneity, Q(14) = 18.99, p = 0.165; pooled across items r(355) = 0.663, p < 0.001.
  • Political concordance: Accuracy prompts more effective for politically concordant headlines (three-way interaction b = 0.015, z = 3.124, p = 0.002), likely because baseline sharing is higher for concordant content (b = 0.102, z = 11.276, p < 0.001). Baseline discernment did not differ by concordance (b = 0.007, z = 1.085, p = 0.278).
  • Study-level moderators: Larger effects on MTurk vs Lucid/YouGov (b = 0.033, t = 2.35, p = 0.032); smaller effects where baseline control discernment was higher (b = −0.468, t = −2.42, p = 0.028); no significant difference between politics and COVID-19 topics (b = −0.017, t = −1.21, p = 0.244). Significant positive effects remain within Lucid/YouGov (b = 0.030, z = 7.102, p < 0.001).
  • Order effects: No evidence of decay across trials (three-way interaction with trial number b = −0.001, z = −1.869, p = 0.062; excluding first four trials b ≈ −0.000, z = −0.292, p = 0.770). Effects persisted throughout sessions.
  • Individual differences: No robust moderation by gender or race; effects larger among older participants (in representative samples), college-educated, higher CRT (reflectiveness), and more attentive participants. Ideology did not moderate effects in representative samples, though conservatives/Republicans showed worse baseline discernment in those samples. On MTurk, partisanship moderated effects (smaller among Republicans) despite no baseline partisan discernment difference; however, prompts still significantly improved discernment among Republicans on MTurk.
  • Overall: Accuracy prompts reliably improve sharing quality primarily by reducing willingness to share false content, with generalizability across topics and prompt types and persistence over the course of an experimental session.
Discussion

Findings support the hypothesis that inattention to accuracy contributes to low-quality sharing: making accuracy salient increases sharing discernment predominantly by reducing sharing of content perceived as inaccurate. The strong item-level linkage between perceived inaccuracy and reduced sharing aligns with a mechanism based on directing attention. Effects are broadly generalizable across politics and COVID-19 content and across different prompt implementations, suggesting a content-general process. Stacked prompts can enhance effectiveness. Study-level patterns indicate larger gains where baseline discernment is worse and among more attentive users, implying that contexts with distracting or emotionally/morally charged content may benefit most. Minimal moderation by ideology in representative samples suggests accuracy prompts can improve sharing quality without exacerbating partisan divides, although baseline discernment differences by ideology remain. The persistence of effects across trials suggests that brief prompts can have sustained within-session influence, relevant for practical deployment. The results also inform debates about behavioral priming by demonstrating a replicable, policy-relevant effect of making accuracy salient on consequential sharing intentions.

Conclusion

This internal meta-analysis of 20 experiments (N = 26,863) demonstrates that accuracy prompts are a replicable and generalizable intervention for improving the quality of news shared online. Prompts reliably increase sharing discernment, mainly by decreasing intentions to share false content, with no consistent reduction in sharing of true content. Effects are observed across topics (politics and COVID-19), prompt types, and subject pools, persist over trials, and are stronger among older, more reflective, and more attentive participants. Stacking prompts yields larger benefits. Future research should expand external validation via additional independent groups and cross-cultural samples, conduct more field experiments across platforms (e.g., Twitter, Facebook, YouTube) and delivery modes (ads, public posts), refine mechanistic accounts (e.g., limited-attention or drift-diffusion models), and examine interactions with complementary interventions (e.g., labeling, media/digital literacy). Incorporating believability assessments for target content and considering audience characteristics will help optimize deployment.

Limitations

As an internal meta-analysis, results are based on studies conducted by the authors’ group and do not comprehensively incorporate external studies; the meta-analysis was not preregistered. Samples were U.S.-only, limiting cultural generalizability. Some included measures (e.g., attention checks, CRT) were collected only in subsets of experiments, and item-level perceived accuracy relied on out-of-sample ratings. Meta-regressions examining heterogeneity across studies had limited power. Headline sets and subject pools varied (MTurk vs representative panels), and while random-effects models address variability, unmeasured contextual factors may influence effect sizes. Studies in which accuracy was rated for every item were excluded due to known strong effects on sharing, which bounds generalizability to prompt-before-task designs.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 22+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny