The Arts

Modeling narrative features in TV series: coding and clustering analysis

M. Rocchi and G. Pescatore

This research by Marta Rocchi and Guglielmo Pescatore delves into the narrative structure of US medical TV series, analyzing over 400 hours of video to uncover key narrative features. Discover how their data-driven approach reveals the genre's surprising stability and the subtle art of storytelling in medicine.... show more

Introduction

The paper advances a data-driven examination of systemic aspects of television series, conceptualizing them as narrative ecosystems whose evolution reflects internal narrative trends/constraints and external production, distribution, audience, regulatory and social factors. Focusing on US medical dramas—one of the most popular and durable TV genres—the study analyzes eight series (Grey's Anatomy, Miami Medical, The Night Shift, Chicago Med, Code Black, The Good Doctor, The Resident, New Amsterdam) over 32 seasons and 608 episodes, using a methodology that treats entire series as datasets to explore evolutionary dynamics and patterns in narrative structure. Building on prior interpretive work that posits three isotopies (plots) in medical dramas—medical cases (anthology), professional, and sentimental (running)—the paper quantitatively investigates these dimensions. Research questions: RQ1: Are isotopies (i.e., the medical cases plot, the professional plot, and the sentimental plot) good descriptors for the medical drama genre? RQ2: Are there any differences within the formulaic aspects of these series? Are there any significant differences considering the relationship between the three isotopies within different series? RQ3: How do the narrative plots of a medical drama change over time? The study aims to validate the isotopy framework as quantitative descriptors, compare formulaic differences across series, and assess temporal evolution of plot balances.

Literature Review

The work draws on the narrative ecosystems paradigm (Innocenti and Pescatore 2012, 2018; Pescatore et al., 2014; Rocchi and Pescatore, 2019) and semiotic notions of isotopy (Greimas and Courtés, Eco). It positions medical drama within genre/formula studies (Schatz, Jovanović; Albuquerque & Meimaridis) and leverages established content analysis traditions in media research (e.g., Fernández-Collado et al.; Signorielli & Bacue; Barker et al.; Chapoton et al.). It also references scriptwriting conventions (Campbell; Snyder; Vogler) as part of broader self-regulatory mechanisms of serial production. Prior medical drama scholarship and audience/health perception studies provide context for the genre’s cultural significance.

Methodology

Corpus and unitization: The sample comprises eight US medical dramas across 32 seasons and 608 episodes. Episodes were manually segmented into units (segments) defined by spatio-temporal-action continuity and thematic-narrative invariance. The protocol employed ELAN for annotation. Isotopy definitions: Three plots (isotopies) were operationalized—medical cases plot (doctor–patient interactions, case stories constituting the anthology dimension), professional plot (workplace relationships: hierarchy, competition, ethics), and sentimental plot (intimate/emotional relationships among main characters: couples, friendship, family, conflicts). Coding and weighting: Each segment received one or more isotopy labels with a weight from 1–6, proportionally allocating segment time across overlapping isotopies (e.g., a 66 s segment with SP weight 4 and PP weight 2 yields 44 s SP and 22 s PP). Unattributable content was marked as uncoded (e.g., landscape shots, titles). This produced time series of narrative biomass (time share) per isotopy for episodes, seasons, and series. Reliability: Coders received training and a detailed coding guide; all coding was supervised and 15% of episodes were re-coded by the supervisor after two years to assess intra-coder reliability using Intraclass Correlation Coefficient. Episode-level ICCs exceeded 0.80: sentimental plot 0.97; professional plot 0.81; medical cases plot 0.92. The time intensity of manual coding was acknowledged as a constraint. Clustering framework: To test RQ1 and RQ2, clustering analyses were performed. For RQ1 (season-level), each season was represented by a 4D vector: median of SP, PP, MC, and uncoded percentages across its episodes. Clustering tendency was assessed via the Hopkins statistic (H = 0.634), rejecting spatial randomness. Multiple methods to estimate cluster number gave differing suggestions (Elbow=4, Silhouette=3, Gap=2). Hierarchical clustering (DIANA) was selected based on internal validity and stability metrics; cluster stability was evaluated using clusterboot with 100 bootstraps. For RQ2 (series-level typical episodes), each series was represented by a 4D vector (median percentages of SP, PP, MC, uncoded across all episodes). Clustering tendency again supported structure (Hopkins H = 0.535). Between-series differences in isotopy distributions at the episode level were tested using Kruskal–Wallis for PP, SP, and MC (all p < 0.0001). For RQ3, season-wise time series of isotopy shares were examined to track temporal evolution and potential inversions among dominant plots.

Key Findings

RQ1: The three isotopies are good quantitative descriptors of medical dramas. Season-level hierarchical clustering produced four stable clusters (clusterboot stability values near 1). Seasons tended to cluster by series. Example stabilities: Grey's Anatomy (GA) cluster stability = 0.978 (with first two GA seasons clustering with The Resident [TR], cluster stability = 0.925); a mixed cluster containing Chicago Med (CM), The Night Shift (TNS), The Good Doctor (TGD), and New Amsterdam (NA) showed stability = 0.992; Code Black (CB) and Miami Medical (MM) formed a stable cluster (0.955). RQ2: Significant between-series differences exist in PP, SP, and MC (Kruskal–Wallis: PP χ²(7)=132.95, p<0.0001; SP χ²(7)=272.56, p<0.0001; MC χ²(7)=348.87, p<0.0001). Typical episode (series-level medians) percentages:

Grey's Anatomy (GA): PP 18.77%, SP 48.07%, MC 30.30%, uncoded 1.89%
Miami Medical (MM): PP 8.57%, SP 20.30%, MC 67.70%, uncoded 3.43%
The Night Shift (TNS): PP 5.91%, SP 35.89%, MC 55.70%, uncoded 1.82%
Code Black (CB): PP 11.16%, SP 19.42%, MC 62.64%, uncoded 3.75%
New Amsterdam (NA): PP 15.90%, SP 29.63%, MC 53.13%, uncoded 1.35%
The Good Doctor (TGD): PP 9.29%, SP 35.00%, MC 54.48%, uncoded 1.23%
The Resident (TR): PP 20.18%, SP 25.56%, MC 51.18%, uncoded 2.59%
Chicago Med (CM): PP 9.42%, SP 26.21%, MC 64.64%, uncoded 0.75% Four narrative profiles emerged:

Soap formula: GA—dominant sentimental plot (~48%).
Anthology formula: MM and CB—high medical cases emphasis (MM ~68–70%; CB ~63–67%) and slightly higher uncoded content (~3%).
Doctors and patients formula: CM, TNS, TGD—balanced patient/doctor stories; lower PP (about 6–9%).
Social formula: NA, TR—elevated professional plot (NA ~16%; TR ~19%) emphasizing ethical/social dimensions. RQ3: Temporal evolution shows strong formulaic stability. Across 32 seasons, only two inversions between isotopies were recorded. GA shifted to a sentimental-dominant profile from season 3 (possibly coinciding with a scheduling change). TGD shows a potential soapward trend by season 3. TR and NA display higher professional emphasis (with TR season 1 particularly distinct). Overall, the prevailing order outside GA is MC > SP > PP, with PP consistently the lowest share.

Discussion

The validation of the three isotopies as discriminative descriptors confirms their suitability for modeling medical drama narratives (addressing RQ1). Clusters aligned with series identities and production logics, indicating that aggregate narrative biomass reliably captures formulaic signatures. The presence of four stable narrative profiles (RQ2) clarifies how products differentiate within a common genre framework—e.g., GA’s soap orientation versus MM/CB’s anthology emphasis, and NA/TR’s social-professional tilt. The dynamic analysis (RQ3) reveals a high degree of stability and self-regulation in narrative variables: despite ongoing introduction of new cases and characters, the relative shares of isotopies remain stable over long runs. This suggests that beyond local creative choices, broader systemic constraints (episode length, network conventions, production economics, audience expectations) regulate narrative formulas. The findings support an ecosystemic approach wherein narrative variables are shaped by internal and external forces, and they highlight opportunities to use isotopy balances to compare series and anticipate trajectories.

Conclusion

The study introduces and validates a quantitative, data-driven framework for modeling TV series narratives via three isotopies—sentimental, professional, and medical cases—demonstrating their effectiveness in distinguishing series and identifying four robust narrative profiles within US medical dramas. It shows that, contrary to assumptions of fluid creative reconfiguration, the overall balance of narrative plots is highly stable over time, consistent with self-regulatory mechanisms in narrative ecosystems. Practical implications include improved understanding of product positioning for producers and potential inputs for content-based recommendation systems for viewers. Future research includes: expanding and diversifying the corpus (e.g., long-running ER, House; additional genres); generalizing isotopies to a cross-genre schema (soap plot, genre plot, anthology plot); conducting finer-grained dynamic analyses (e.g., sequence analysis) and integrating exogenous variables (production, distribution, regulation, audience metrics) into systemic models linking narrative features with industrial and contextual factors.

Limitations

Manual segmentation and coding are time-consuming and dependent on coder expertise; while training, supervision, and intra-coder ICCs (SP 0.97; PP 0.81; MC 0.92) support consistency, coder misunderstandings can systematically bias isotopy balances. Automated segmentation tools were not available. The study aggregates at the season level to mitigate high episode-level variability, potentially obscuring fine-grained fluctuations. Optimal cluster number estimates were inconsistent across methods; hierarchical clustering was chosen based on internal/stability measures. The sample is limited to eight US medical dramas over 32 seasons; some comparative tests (e.g., post-hoc pairwise differences after Kruskal–Wallis) were beyond scope. Findings for younger series (e.g., The Good Doctor) are provisional pending additional seasons.

Related Publications

Explore these studies to deepen your understanding of the subject.

Psychology

Deconstructing heterogeneity in schizophrenia through language: a semi-automated linguistic analysis and data-driven clustering approach

V. Bambini, F. Frau, et al.

Engineering and Technology

Research hotspots and trends in heritage building information modeling: A review based on CiteSpace analysis

Z. Zhang and Y. Zou

Psychology

Diet, gym, supplements, or maybe it is all in your mind? A systematic review and meta-analysis of studies on placebo and nocebo effects in weight loss in adults

Ł. Kryst, P. Bąbel, et al.

Medicine and Health

The role of the private sector in noncommunicable disease prevention and management in low-and middle-income countries: a series of systematic reviews and thematic syntheses

K. Marshall, P. Beaden, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny