
Political Science
How to convince in a televised debate: the application of machine learning to analyze why viewers changed their winner perception during the 2021 German chancellor discussion
F. Ettensperger, T. Waldvogel, et al.
This research conducted by Felix Ettensperger, Thomas Waldvogel, Uwe Wagschal, and Samuel Weishaupt dives into what sways viewer perceptions of debate winners during the 2021 German chancellor discussion. Through advanced machine learning techniques, the study uncovers how pre-debate preferences, candidate images, and pivotal speech moments influence opinion shifts. Discover the surprising elements that shape political debate performance!
~3 min • Beginner • English
Introduction
The 2021 German federal election was unusual: no incumbent chancellor ran, three parties (CDU/CSU, SPD, Greens) competitively led polls at different times, and three candidates (Baerbock, Laschet, Scholz) had realistic chances. Televised debates (Trielle) played a central role in political communication, reaching 4–11 million viewers; the second Triell, with about 11 million viewers and held two weeks before Election Day, was particularly consequential. The study asks who was perceived to have won the debate and, crucially, why viewers changed their winner perception during the debate. Prior work indicates debate winner perceptions influence electoral decisions and, in Germany, may be less constrained by strong partisanship than in the U.S. The authors leverage a large-N design linking pre- and post-debate surveys with second-by-second real-time response (RTR) ratings to move beyond average performance assessments and identify how specific candidate statements, moderated by viewer predispositions, produce within-debate changes in who is seen as winning. The research contributes by testing whether selective perception dominates debate reception or whether candidates can shift perceptions with specific statements.
Literature Review
Debate research emphasizes pre-existing preferences and cognitive dissonance mechanisms (selective exposure and selective perception) as filters shaping debate evaluations. Viewers tend to rate their preferred candidate as the winner; candidate images (credibility, competence, likability, leadership) also matter. While selective exposure is limited in debates (viewers receive all sides), selective perception may still rationalize dissonant information, though a tipping point can be reached where new information changes attitudes, especially among weak partisans. Heuristics like party identification guide processing, and pre-debate expectations about the winner can bias evaluations, though effects in Germany are often small. Debates can reshape candidate images; valence perceptions can influence post-debate judgments. Rhetorical strategies (attack, acclaim, defense) have mixed observed effects; attacks can backfire or help depending on content and recipient characteristics. RTR methods allow second-by-second measurement of audience reactions and can capture how not only what is said but how it is said affects perceptions. Prior studies suggest perceived debate performance is central to winner judgments, but evidence on the impact of individual statements is scarce. The authors identify two main questions: the extent to which political predispositions filter perceptions versus the ability of candidate statements to change winner judgments, and whether decisive statements share common rhetorical characteristics. They hypothesize: (1) pre-debate dispositions (winner expectation, chancellor preference, party ID, candidate images, age) shape the probability of change in winner perception; (2) agreement/disagreement with key speech moments (measured with RTR) determines change.
Methodology
Design: Quasi-experimental large-N field study during the second 2021 German TV Triell (Sept 12, 2021; ~90 minutes; ~11 million viewers on ARD/ZDF) with synchronized pre-/post-surveys and second-by-second RTR via a web app (Debat-O-Meter).
Sample: 11,000+ logins; ~9000 completed pre-survey; 8000+ provided at least one RTR; after activity filters and anti-spam checks, N=4613 participants with continuous RTR and post-survey. Sample is younger (56% <40), 55% male/43% female/1% diverse, highly educated (58% tertiary), high political interest (84% high/very high), and overrepresents Green identifiers/voters (39%). Recruitment via ~20 newspapers and PolitikPanel Deutschland. Convenience sample; not representative, but robustness checks with TV-only viewers (older, more conservative) yield similar results.
Device/measurement: Debat-O-Meter tutorial, pre-survey, RTR module, post-survey. RTR slider from −2 (very bad) to +2 (very good) logged each second per candidate; inactivity coded 0. Inputs time-stamped and stored pseudonymously. Coding of debate produced 293 candidate speaking segments (Baerbock 91, Laschet 100, Scholz 102). For each segment, positive/negative RTR inputs were summed; responses up to 4 seconds post-speech were attributed to the segment to account for response lag.
Measures:
- Endogenous variable: change in debate winner perception (pre-debate winner expectation vs. post-debate winner judgment; options: Scholz, Baerbock, Laschet, tie). 44.5% (2053/4613) changed; 55.5% (2560) did not. Shifts: to Baerbock 24% (1091), to Laschet 10% (425), to Scholz 5% (250); from no response to any 2% (96); to draw 4% (191).
- RTR variables: per-segment agreement for 293 speech moments (−2 to +2) for each candidate.
- Exogenous variables: party identification; pre-debate chancellor preferences (ranked for Baerbock, Laschet, Scholz); candidate images (credibility, likability, leadership, competence on −2 to +2); pre-debate winner expectation (Scholz 36% [1675], Baerbock 26% [1206], Laschet 15% [689], draw 21% [945]).
- Controls: age (7 categories), gender, education (6 categories), political interest (5-point).
Machine learning:
- Decision trees: binary splitting with complexity parameter (CP) tuned between 0.01–0.015 to balance complexity/sensitivity; used to identify combinational pathways linking predispositions and specific RTR reactions to change outcomes.
- Random forest (RF): 1250 trees; bagging with replacement; tried 250 variables at each split out of total 314 variables; evaluated Out-of-Bag (OOB) performance (AUC, error rate, Brier). Separate RFs for change to Baerbock (M1) and change to Laschet (M2); for Scholz, class imbalance limited RF sensitivity, so decision trees were emphasized; an RF predicting shift away from Scholz is in the annex.
Model diagnostics (examples):
- Change to Baerbock (M1): OOB AUC 93.22; overall accuracy 87.4%; class error for changers 26.49%. Confusion matrix: TN=3241, FP=281, FN=289, TP=802.
- Change to Laschet (M2): OOB AUC 95.86; overall accuracy 95.9%; class error for changers 24.94%. Confusion matrix: TN=4109, FP=79, FN=106, TP=319.
Ethics and data: Institutional approval (July 13, 2016); informed consent; anonymized data and replication materials available at Harvard Dataverse (DOI: 10.7910/DVN/CBAKME).
Key Findings
- Substantial within-debate change: 44.5% (2053/4613) altered their winner perception. Net shifts favored Baerbock (+1091; 24%) and Laschet (+425; 10%); Scholz gained fewer (+250; 5%) and lost more overall.
- Predispositions matter most among ex-ante factors: Pre-debate winner expectation (pre.victor) and pre-debate chancellor preference were the strongest predictors of change (RF VIMs), consistent with Hypothesis 1. Candidate images—especially credibility and competence—also contributed; sympathy and leadership mattered to a lesser degree. Party identification was largely uninformative for predicting change.
- Specific speech moments were decisive (Hypothesis 2): Across models, a small set of candidate statements strongly predicted changes when combined with predispositions, demonstrating that what candidates say at particular moments can shift perceptions.
Baerbock (change to AB; RF AUC 93.22):
- Key predictors: pre.victor (most important), RTR on climate-policy critique of Grand Coalition (V204), Laschet’s attack on “bans and slogans” (V157), Baerbock’s EV counter-attack (V207), and stance on a digitalization ministry (V137). Decision-tree pathways showed, for non-AB pre.victors, strong agreement with V204 (>+1.5) led to frequent switches to Baerbock (Pathway D). Another pathway combined disapproval of V157 (≤+0.5), mild-to-positive response to V207 (>−0.5), and positive response to V137 (>+0.5), yielding a 59% switch rate in that subgroup (Pathway C).
Laschet (change to AL; RF AUC 95.86):
- Key predictors: pre-debate chancellor preference for Laschet (top VIM), pre.victor, and agreement with V288 (patriotic closing address; rank 3 VIM). Other decisive statements: V157 (attack on AB’s climate “bans”), V42 (Wirecard/Finance Ministry attack on Scholz), V191 (energy/climate responsibility dispute), V153 (defense; nuclear vs coal sequencing), V273 (against tax hikes), V262 (orderly immigration). Decision trees uncovered multiple pathways; e.g., Pathway E (non-Laschet pre.victor), with V288 ≥ +1.5 and V42 ≥ −0.5 (and not highly negative), produced 86% switches, mainly from Scholz to Laschet.
Scholz (change to OS; decision trees):
- Only ~1% of participants matched the main switching pattern; however, within that subgroup, 83% switched to Scholz given strong agreement with V17 (>+1.5; humorous defense re: not excluding Die Linke, affirming voter sovereignty), plus favorable views on pensions (V244 > +0.5). V244 acted as a fork between AB and OS pathways. V200 also featured in the Scholz pathway.
- Demographics were weak predictors: Age showed some gradients for AB change, but socio-demographics generally had limited influence.
- Model performance indicates high predictive validity for change to AB and AL (RF AUCs >93), validating the combined use of predispositions and granular RTR data for forecasting within-debate perceptual change.
Discussion
The study directly addresses whether debate reception is driven more by selective perception based on predispositions or by the persuasive force of specific candidate statements. Results show both mechanisms operate: pre-debate winner expectations and chancellor preferences anchor perceptions, but specific, salient statements can tip viewers toward a different winner judgment when they resonate with individual predispositions. Candidate images—especially credibility and competence—shape openness to persuasion, while stable party identification does little to explain the dynamics of change. Machine learning methods allowed simultaneous modeling of numerous RTR moments and predispositions, revealing that a handful of high-leverage speech segments (e.g., climate-policy exchanges, Wirecard attacks, patriotic closings, pension guarantees) measurably shifted perceptions in identifiable subgroups. These findings refine debate-effects theory by demonstrating that dynamic, moment-level reactions—not just average performance—drive within-debate changes, particularly in multiparty contexts with weaker partisan anchors.
Conclusion
The paper advances debate research by: (1) integrating large-N real-world RTR data with pre-/post-surveys and machine learning to predict within-debate changes in winner perception; (2) identifying how pre-debate expectations and candidate images interact with reactions to specific speech moments to produce perceptual shifts; and (3) introducing a replicable analytical toolbox (RF and decision trees) that pinpoints decisive moments and subgroup pathways. Substantively, pre-debate winner expectations and chancellor preferences are pivotal, while party identification is largely irrelevant for change. Several discrete statements—especially around climate policy, corruption oversight (Wirecard), value-affirming closings, and pension guarantees—were critical inflection points. Future research should: examine model sensitivity to class imbalance; incorporate emotional response measures; extend to other debates and more representative samples; systematically code all statements to generalize rhetorical-strategy effects; and further explore how ML-specific properties interact with debate data structures.
Limitations
- Quasi-experimental field design limits causal identification and may omit unobserved variables (e.g., emotions not included in main models).
- Convenience sample deviates from the German population (younger, more educated, more Green-leaning), constraining generalizability; TV-only robustness checks mitigate but do not eliminate concerns.
- Class imbalance (especially few positive changers to Scholz) reduced RF sensitivity for that outcome; decision trees addressed interpretability but not all ML limitations.
- Analytical approach identified decisive statements from RTR + ML rather than pre-coding all statements, limiting comprehensive rhetorical-strategy generalization.
- RTR data are autocorrelated, hierarchical, and time-series in nature, challenging standard linear assumptions; while ML handles nonlinearity well, dependencies remain a methodological consideration.
Related Publications
Explore these studies to deepen your understanding of the subject.