logo
ResearchBunny Logo
Where do cross-cutting discussions happen?: Identifying cross-cutting comments on YouTube videos of political vloggers and mainstream news outlets

Political Science

Where do cross-cutting discussions happen?: Identifying cross-cutting comments on YouTube videos of political vloggers and mainstream news outlets

S. W. Chae and S. H. Lee

Are online comment sections echo chambers or unexpected marketplaces of debate? Researchers Seung Woo Chae and Sung Hyun Lee analyze comments on political vlogger videos and mainstream news outlet videos—using manual coding and NLP classifiers—to show how channel political leaning and media type shape cross-cutting discussions and suggest neutral outlets may foster debate.... show more
Introduction

The study examines whether social media, particularly YouTube, fosters echo chambers or facilitates cross-cutting discussions between opposing political groups. Building on debates around selective exposure and algorithmic reinforcement, it focuses on political vloggers versus mainstream news outlets. It formulates research questions to quantify cross-partisan comments on conservative and liberal vlogger channels (RQ1-1), test asymmetry by vlogger leaning (RQ1-2), evaluate whether vlogger comments can train NLP models to classify mainstream news comments (RQ2), assess cross-cutting differences between conservative and liberal mainstream outlets (RQ3-1), measure conservative-to-liberal ratios on a neutral outlet (C-SPAN) (RQ3-2), and compare cross-cutting proportions between vloggers and mainstream outlets (RQ4).

Literature Review

The paper reviews echo chamber research and selective exposure, highlighting concerns that social media algorithms reinforce preexisting beliefs and polarization. Contradictory evidence shows cross-cutting discussions occur online, challenging the inevitability of echo chambers. Wu and Resnick (2021) found asymmetric cross-cutting on YouTube—more conservatives comment on left-leaning videos than liberals on right-leaning ones—and higher cross-cutting on mainstream outlets than independent media. The authors identify a gap: prior work grouped independent outlets and individual vloggers together, obscuring vlogger-specific dynamics. They also discuss parasocial relationships between YouTube influencers and audiences, suggesting that intimacy with vloggers may reduce cross-cutting by aligning viewers’ stances with the persona.

Methodology

Design: Two-stage analysis—manual coding of vlogger comments (training data) and supervised NLP classification of mainstream news comments (target data). Topic: Single issue to maintain contextual homogeneity—Mueller report (March–April 2019). Comments collected from videos posted between March 22, 2019 (report submitted) and April 18, 2019 (Attorney General Barr’s press conference). Data collection: YouTube API; only top-level comments (replies excluded). Collection completed within ~3 hours to minimize time variance. Vlogger sample: Judgment sampling from top political vloggers (Feedspot, Socialblade). Inclusion criteria: video on Mueller report within one week of March 22, 2019; ≥100,000 views or ≥100 comments; balance of conservative and liberal. Final: 10 videos (5 conservative: Steven Crowder, Mark Dice, Paul Joseph Watson, Hunter Avallone, RonPaulLibertyReport; 5 liberal: The Jimmy Dore Show, Secular Talk, David Pakman Show, The Majority Report w/ Sam Seder, TBTV). From each video, 100 randomly selected top-level comments; total N=1,000 (500 conservative vlogger videos; 500 liberal vlogger videos). Manual coding: Two coders trained with a codebook defining four labels: conservative, liberal, other, indeterminable. Intercoder reliability on 200 overlapping comments: Gwet’s AC1 = .835 (> .800 threshold). Remaining 800 comments split between coders. Cross-cutting operationalization: comments whose leaning opposes the channel’s leaning. Chi-square tests on binary cross-cutting vs. non-cross-cutting. Mainstream news sample: Seven videos of Barr’s April 18, 2019 press conference with no commentary, from conservative (Fox News; LiveNOW from FOX), liberal (CNN; MSNBC), and neutral (C-SPAN) outlets. Totals: 4,230 top-level comments (conservative outlets: 2,421; liberal outlets: 1,672; neutral: 137). Preprocessing and embeddings: spaCy for stopword removal (326 stopwords) and tokenization. Sentence embeddings via Sentence-BERT (SBERT), model all-mpnet-base-v2. Models: Supervised multiclass classification with three labels—conservative, liberal, neither (combined from original manual categories other + indeterminable). Algorithms: logistic regression, SVM, random forest. Evaluation: On 200 randomly sampled mainstream comments, compared to human coding (one coder). Metrics: accuracy, macro-F1, macro-precision, macro-recall; baseline accuracy = .333. Models exceeding baseline applied to remaining 4,030 comments. Proportions computed per outlet leaning; ratio of conservative-to-liberal for neutral outlet; chi-square tests for cross-cutting asymmetry by outlet leaning and for vlogger vs. mainstream comparisons (using the best-performing model).

Key Findings

Manual vlogger analysis (N=1,000):

  • Conservative vlogger videos (n=500): conservative 381 (76.2%), liberal 15 (3.0%), other 20 (4.0%), indeterminable 84 (16.8%). Cross-cutting (liberal on conservative): 3.0%.
  • Liberal vlogger videos (n=500): liberal 350 (70.0%), conservative 50 (10.0%), other 5 (1.0%), indeterminable 95 (19.0%). Cross-cutting (conservative on liberal): 10.0%.
  • Difference significant: X²(1, N=1000) = 19.02, p < .001. Model performance on 200 mainstream comments:
  • Logistic regression: Accuracy .640; Macro-F1 .578; Macro-precision .623; Macro-recall .568.
  • SVM: Accuracy .615; Macro-F1 .558; Macro-precision .567; Macro-recall .567.
  • Random forest: Accuracy .610; Macro-F1 .439; Macro-precision .562; Macro-recall .446. Mainstream predictions (total N=4,230):
  • Logistic regression (best model): • Conservative outlets (n=2,421): conservative 1,282 (53.0%); liberal 949 (39.2%); neither 190 (7.9%). • Liberal outlets (n=1,672): conservative 909 (54.4%); liberal 704 (42.1%); neither 59 (3.5%). • Neutral outlet C-SPAN (n=137): conservative 64 (46.7%); liberal 62 (45.3%); neither 11 (8.0%). Ratio cons:lib ~1.03:1.00. • Variation by outlet leaning significant: X²(4, N=4230) = 35.20, p < .001. • Cross-cutting asymmetry: liberal outlets’ cross-cutting (conservative on liberal) 54.4% vs conservative outlets’ cross-cutting (liberal on conservative) 39.2%; X²(1, N=4093) = 91.17, p < .001.
  • SVM: • Conservative outlets: cons 48.5% (1,175); lib 39.0% (943); neither 12.5% (303). • Liberal outlets: cons 49.5% (827); lib 43.7% (730); neither 6.9% (115). • Neutral: cons 45.3% (62); lib 45.3% (62); neither 9.5% (13); ratio 1.00:1.00. • X²(4, N=4230) = 37.42, p < .001; cross-cutting asymmetry X²(1, N=4093) = 44.09, p < .001.
  • Random forest: • Conservative outlets: cons 71.5% (1,730); lib 27.4% (663); neither 1.2% (28). • Liberal outlets: cons 66.6% (1,113); lib 32.6% (545); neither 0.8% (14). • Neutral: cons 62.0% (85); lib 37.2% (51); neither 0.7% (1); ratio 1.67:1.00. • X² = 17.25, p = .004 (Monte Carlo, 2,000 replicates); cross-cutting asymmetry X²(1, N=4093) = 616.52, p < .001. Vlogger vs mainstream (logistic regression):
  • Conservative: mainstream cross-cutting 39.2% vs vlogger 3.0%; X²(1, N=2921) = 91.17, p < .001.
  • Liberal: mainstream cross-cutting 54.4% vs vlogger 10.0%; X²(1, N=2172) = 243.96, p < .001. Additional observation: Across models, conservative comments outnumbered liberal comments on both conservative and liberal mainstream outlets, likely reflecting the contemporaneous news context favorable to conservatives.
Discussion

Findings substantiate that cross-cutting comments occur more on liberal channels than conservative ones, both among vloggers and mainstream outlets, aligning with prior evidence of asymmetric cross-partisan engagement. Mainstream outlets host substantially more cross-cutting than vlogger channels, suggesting media type shapes interaction patterns; parasocial relationships and community dynamics around vloggers may reinforce ideological homogeneity and reduce cross-cutting. Neutral outlets (C-SPAN) showed near-balanced conservative-liberal participation and qualitatively less antagonistic content, indicating such venues may facilitate more constructive dialogues. However, cross-cutting metrics can mask hostile or troll-like exchanges, so quantitative measures should be complemented by qualitative assessment. Methodologically, using vlogger comments as training data yielded reasonably strong multiclass classification performance, indicating platform-consistent training data can enhance NLP analyses of political leanings. The distribution of comment leanings appears sensitive to news context for mainstream outlets but less so for vloggers, whose communities may be more stable and insulated from topic salience.

Conclusion

The study extends cross-cutting discussion research by isolating political vloggers from other media types and comparing them to mainstream outlets on YouTube. It confirms asymmetric cross-cutting favoring liberal channels, shows mainstream outlets host higher shares of cross-cutting than vlogger channels, and suggests neutral outlets as promising venues for constructive cross-partisan engagement. It demonstrates that vlogger comments can serve as effective training data for NLP classification of political leanings across media types. Future work should: analyze recommendation and comment networks around vlogger communities; examine the role of parasocial relationships and community culture in shaping discourse; broaden topical scope beyond a single issue; and integrate qualitative assessments to interpret cross-cutting quality and context effects.

Limitations

Generalizability is limited by reliance on a single political issue (Mueller report) to ensure contextual homogeneity for NLP. Multiclass classifiers showed moderate accuracy, necessitating cautious interpretation of outlet-level results. The study did not account for individual channel/community cultures among vloggers, which may influence discourse patterns. Replies were excluded, potentially omitting conversational context that could affect classification and cross-cutting measurement.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny