Communication
Affect in science communication: a data-driven analysis of TED Talks on YouTube
O. Fischer, L. T. Jeitziner, et al.
Explore how emotions shape audience engagement with TED Talks on YouTube in this fascinating study by Olivia Fischer, Loris T. Jeitziner, and Dirk U. Wulff. Discover the intriguing links between word choices, popularity, and audience reactions that can enhance your understanding of effective communication.
~3 min • Beginner • English
Introduction
The study addresses how affect in science communication influences public engagement on social media. In the competitive, algorithm-driven environment of platforms like YouTube, surface-level features such as language choice can determine reach. The authors propose that affect—captured as valence (positivity/negativity) and density (frequency of affect-laden words)—may shape engagement. They pose two research questions: (1) How is affect used in TED Talks compared with other media used for science communication? (2) Is affect, as a surface-level characteristic, associated with audience engagement (views, likes, dislikes, comments) with TED Talks on YouTube? These questions are important because scientists increasingly rely on lay audiences to disseminate content and need evidence-based guidance to optimize communication for reach and impact.
Literature Review
Prior work shows affect can drive dissemination: News articles with more affect were more likely to be emailed (Berger & Milkman, 2012), affect-rich descriptions of scientific findings increased sharing and citations (Milkman & Berger, 2014; Fronzetti Colladon et al., 2020). Research on YouTube science communication has emphasized presenter characteristics (e.g., gender, authenticity) and viewers’ responses (e.g., comment sentiment, eye-tracking) rather than affect in the content itself (Amarasekara & Grant, 2019; Kaul et al., 2020; Boy et al., 2020; Shapiro & Park, 2015). TED Talks are widely used as modern science communication but vary in topic and speaker background. Existing studies used tag-based topic distinctions (Sugimoto & Thelwall, 2013). Sentiment analysis commonly uses dictionaries like LIWC; here, the authors leverage SentiWordNet, distinguishing between valence and the novel construct of affective density. The gap identified is a systematic assessment of how affect in social media-based science communication relates to public engagement, particularly on YouTube.
Methodology
Data collection and matching: The authors downloaded all available TED Talk transcripts and metadata from ted.com (N=6304). They removed interview-only transcripts (465), leaving 5839 transcripts. Engagement data (views, likes, dislikes, comments) for 3545 videos on the TED YouTube channel were obtained via the YouTube API. Transcripts and videos were matched by titles using exact matching (2475) and approximate matching plus manual checks (487), yielding 2962 matched entries published between early 2007 and end of 2020 (data collected Dec 29, 2020).
Topic inference and science index: Using 447 TED-assigned tags (mean 8.2 per talk), the authors computed Jaccard similarities between tag pairs to build a weighted tag network. They applied the Louvain modularity algorithm (igraph in R) to identify seven tag communities/topics: Mind, Entertainment, Tech, Health, Cosmos, Environment, Society. Talks were assigned to topics using maximum positive point-wise mutual information between talks and topics. Validation used title word PMI and Universal Sentence Encoder similarity, showing higher within-topic than between-topic similarity (Cohen’s d from 0.18 to 0.69). A science index was computed per topic: the percentage of talks with the tag “Science” or containing “science,” “experiment,” or “study” in transcripts. Science index by topic: Health 79%, Cosmos 78%, Mind 69%, Environment 64%, Tech 58%, Society 43%, Entertainment 37%.
Sentiment analysis: Using SentiWordNet (20,000+ words; scores −1 to 1; average −0.06, SD=0.34), two measures were computed per transcript: (1) Affective valence = average of sentiment values for words with available scores; (2) Affective density = proportion of words with nonzero sentiment scores (indicator function I(si ≠ 0) averaged over tokens with available sentiment). This distinction between valence and density is emphasized as novel in this context.
Engagement dimensions: Principal component analysis (PCA) on YouTube engagement variables (views, likes, dislikes, comments; intercorrelations 0.70–0.92) yielded two components accounting for 95.4% variance: Popularity (high loadings for views and likes) and Polarity (high loadings for dislikes and comments).
Comparative media analysis: The authors compared TED Talk valence and density to samples from other media: arXiv preprints (STEM), Psychological Science articles, Wikipedia, news, books, and video-based media (movies, TV shows, soap operas).
Statistical modeling: Separate multiple regressions predicted Popularity and Polarity from valence and density. Covariates: topic, YouTube publishing date, video duration, and Flesch Reading Ease (readability). Moderation analyses compared affect effects within versus outside specific topics or tags (excluding the topic factor to test moderation). Additional analyses tested moderation by the science index. Effect sizes are reported as Cohen’s d with F-tests and p-values (Table 1).
Key Findings
Use of affect in TED Talks vs. other media: TED Talks exhibited higher valence and especially higher density than text-based media (arXiv, Psychological Science, books, Wikipedia, news) but lower than other video media (movies, TV, soap operas). Affect use in TED Talks more closely resembled video-based media. Over time, valence decreased since 2007 while density increased in recent years; affect varied by topic (highest valence in Entertainment; highest density in Mind).
Engagement associations (Table 1; effect sizes as Cohen’s d):
- Popularity: Positively associated with valence (d=0.12, F=9.68, p=0.002) and density (d=0.21, F=32.21, p<0.001). Density’s effect was about twice that of valence, both small. Longer duration (d=0.32, F=72.08, p<0.001) and higher readability (d=0.20, F=30.43, p<0.001) also predicted higher popularity. Topic effects: Mind positively associated; Environment and Society negatively associated; overall topic factor significant (d=0.42, F=21.74, p<0.001). Date not significant for popularity (d=-0.05, F=1.76, p=0.184).
- Polarity: Negatively associated with valence (d=-0.08, F=5.05, p=0.024) indicating more negative valence linked to higher polarity; density not associated (d=0.02, F=0.29, p=0.589). Longer duration increased polarity (d=0.20, F=28.36, p<0.001). Topic effects: Society positive; Health, Cosmos, Environment negative; topic factor significant (d=0.43, F=22.58, p<0.001). Date strongly associated (d=0.51, F=188.07, p<0.001).
Moderation by topic and tags:
- Popularity: Environment (tags “Green,” “Sustainability”) showed reduced density–popularity effect (density no longer related; d≈-0.02). Mind (tags “Decision-Making,” “Mental Health”) showed increased density–popularity effect. Society (tags “Immigration,” “Refugees”) showed stronger valence–popularity effect. Health (tags “Medicine,” “DNA”) showed a slight negative valence–popularity relationship (d≈-0.07). Cosmos, Tech, Entertainment showed smaller moderation.
- Polarity: Tech (tags “AI,” “Machine Learning”) and Environment (“Green,” “Sustainability”) showed increased density–polarity effects (Tech d=0.49; Environment d=0.27). Society (tags “Refugees,” “Criminal Justice”) showed reduced density–polarity, yielding a small negative effect (d=-0.17). Entertainment showed increased valence–polarity (positive valence slightly increased polarity; d=0.14). Mind, Cosmos, Health showed smaller moderation.
Moderation by science index: Small moderation for popularity—slightly reduced valence–popularity and slightly increased density–popularity; no moderation for polarity. Within talks with a positive science index: valence–popularity d=0.06; valence–polarity d=-0.07; density–popularity d=0.29; density–polarity d=0.02.
Overall: Higher valence and density predict higher popularity; higher valence predicts lower polarity; topic content moderates these effects, while the presence of scientific content per se (science index) does not meaningfully change them.
Discussion
The findings address the research questions by showing that TED Talks employ more affective language than traditional text-based scientific media and that affective characteristics of the transcript predict distinct forms of engagement on YouTube. Specifically, higher positivity (valence) and greater use of affect-laden words (density) increase popularity (views and likes), while higher positivity reduces polarity (dislikes and comments). Topic-specific moderation indicates that content domain shapes how affect relates to engagement, with controversial or highly salient topics (e.g., AI, refugees, sustainability) showing stronger or reversed associations.
The authors suggest two nonexclusive mechanisms: (1) affective language may alter audience mood/arousal, influencing engagement propensity; (2) affect may signal opinionated or assertive stances, eliciting supportive or critical responses. Practically, science communicators might enhance reach by judiciously increasing affective density and positive valence, while recognizing that effects can be topic-dependent and that over-positivity or mismatch with audience expectations may backfire. The distinction between popularity and polarity underscores the multifaceted nature of engagement and the need to tailor communication strategies to desired outcomes.
Conclusion
This study demonstrates that affect—operationalized as valence and density—in TED Talk transcripts is systematically associated with public engagement on YouTube, with density especially predictive of popularity and valence inversely related to polarity. Affect use in TED Talks aligns more with video-based than text-based media. Topic content moderates affect–engagement links, whereas the presence of scientific content per se does not meaningfully alter them. These insights provide actionable guidance for science communicators seeking to optimize reach and response on social platforms.
Future research should use experimental designs to establish causality, explore additional engagement metrics such as shares, and test generalizability across other platforms and formats (e.g., academic social media posts, press releases, blogs). Further work could investigate mechanistic pathways (e.g., mood, arousal, perceived assertiveness) and interactions with other communicative features (jargon use, narrative structure, visuals).
Limitations
- Correlational design precludes causal inference; experimental studies are needed to identify mechanisms.
- Generalizability is uncertain: TED Talks represent a specific format and audience; effects may differ for text-based or other science communication formats (e.g., press releases, academic social media).
- Mixed content on TED (scientific and non-scientific) may influence audience evaluations; context-specific dynamics could differ from platforms focused solely on science.
- Engagement data lacked shares, an important participatory metric likely related to popularity and affect.
- The data span 2007–2020 and are limited to videos on the TED YouTube channel; platform changes and temporal dynamics may affect engagement patterns.
- Data-driven approach limits insight into precise psychological mechanisms linking affect to engagement.
Related Publications
Explore these studies to deepen your understanding of the subject.

