The Arts

Using data science to understand the film industry’s gender gap

D. Kagan, T. Chesney, et al.

This groundbreaking study by Dima Kagan, Thomas Chesney, and Michael Fire explores gender bias in movies using innovative data science techniques. By analyzing IMDb data alongside movie dialogue subtitles, they reveal a promising trend towards greater representation of women in film, including increased roles and influence. Discover their new approach to evaluating female characters that surpasses the traditional Bechdel test!

00:00

~3 min • Beginner • English

Index

Introduction

The film industry reaches billions of viewers and shapes attitudes, behaviors, and self-image. Prior work shows that women are underrepresented and stereotyped in film, with male directors predominating and male speaking roles outnumbering female roles. This study aims to systematically quantify and track gender disparities in movies by constructing large-scale character interaction networks from subtitles and IMDb data. The authors pose four research questions: (1) Are there genres without a gender gap? (2) What do character relationships reveal about gender, and how has this changed over time? (3) Are women receiving more central roles today? (4) How has the fairness of female representation changed over time? The study develops a scalable methodology (Subs2Network) and analytical framework to address these questions across nearly a century of film.

Literature Review

The paper situates its work within social network analysis (SNA) applied to films. Prior methods include RoleNet (linking co-appearance in scenes via image processing and face recognition), Character-net (dialog-based networks from script–subtitle alignment), screenplay parsing approaches using machine learning, and CoCharNet (weighted co-appearance networks for importance ranking). StoryRoleNet combined video and subtitles to improve accuracy but was evaluated on few titles. A publicly available Moviegalaxies dataset exists but with limited methodological transparency. Beyond SNA, numerous studies document gender gaps across domains (e.g., citation disparities, Wikipedia coverage differences, media underrepresentation). In film, women are underrepresented and often stereotyped. The Bechdel test—requiring two named women who talk to each other about something other than a man—has become a common but imperfect benchmark used in research to study gender bias. Recent NLP-based analyses of screenplays reveal gendered portrayals and centrality differences, while also highlighting shortcomings of the Bechdel test (e.g., passing with minimal female interaction).

Methodology

Data sources: The study fuses IMDb datasets (titles, crew, ratings, votes) with crowdsourced English subtitles (downloaded via Subliminal) for 15,540 full-length movies. Bechdel test labels are drawn from the Bechdel Test Movie List (7,871 movies; 7,322 full-length). Subs2Network algorithm: Subtitles are processed with NER (Stanford NER and spaCy) to extract person/organization entities with timestamps. Entities are matched to character lists (IMDb/TMDb) using heuristics and fuzzy matching (FuzzyWuzzy WRatio), handling partial names, aliases, and ambiguous surnames. Hearing-impaired subtitles, when available, provide speaker tags that improve mapping. A movie’s character social network G=(V,E) is constructed where vertices are characters and edges represent inferred interactions within temporal proximity; edge weights accumulate co-occurrences. To reduce noise, edges with weight below w_min=3 are filtered. Matching heuristic (Algorithm 1): Split roles into first/last names, link unique name matches directly; otherwise apply WRatio to map partial mentions to full names above a threshold. Evaluation of constructed networks: Quality is assessed by (a) central character analysis—comparing top-5/top-10 central nodes against IMDb credit order (excluding titles with alphabetical credits) and against screenplay-based ScriptNetwork; and (b) edge coverage—comparing overlapping subgraphs with ScriptNetwork via Coverage(G)=|E(G)∩E(H)|/|E(G)|. A small additional benchmark of 15 movies is created from Amazon X-Ray (co-appearance by scene) to compare node/edge overlap and character detection by screen time. Preprocessing and name validation: Characters are matched across IMDb and TMDb by actor names; the longer character string is used to capture more variants (e.g., Santino/Sonny Corleone in The Godfather). Feature engineering: Five feature groups are computed: (1) vertex features (total weight/strength, closeness, betweenness, degree, clustering, PageRank); (2) network features (|V|, |E|, density, and distributional stats over vertex metrics); (3) gender representation features (counts and percentages of triangles by number of women 0–3; number of females in top-10 PageRank roles; male/female counts); (4) movie features (year, IMDb rating, runtime, genres, votes); (5) actor features (birth/death years; age at filming). Actor gender is inferred from IMDb roles (actor/actress) and name–gender mapping where needed. Analytical design: To test for gender gaps by genre, the authors analyze popular movies (≥200 IMDb votes), split by gender and genre, and apply Mann–Whitney U tests on features. Relationship structures are examined via triangle counts and composition over time and by genre. Centrality over time is assessed via PageRank-based rankings for top roles and gender composition in top-1/3/10 central roles. Bechdel classifier: A Random Forest (max depth=5) is trained on the 1,000 newest Bechdel-labeled movies using network, vertex, and gender representation features; performance is evaluated using AUC and compared against prior work. Feature importance highlights which attributes drive Bechdel prediction. Alternative gender equality metric: The authors propose the Gender Degree Ratio test comparing the total degree (interaction volume) of female versus male nodes, recommending a fairness band 0.8 < (TotalDegreeMale / TotalDegreeFemale) < 1.2. Normality of ratios is checked via Shapiro–Wilk; significance tests compare groups (e.g., feminist lists, male-centric lists, and movies that pass Bechdel but are argued to be unfair).

Key Findings

- Dataset and framework: Constructed 15,540 movie social networks (largest to date), open-source tools released. - Gender gap by genre: Many genres still show significant differences between male and female roles. Genres with the most similar distributions across genders for many features include film-noir, history, horror, music, musical, mystery, and war (9/10 features similar; clustering differs). Total Weight and Weighted Betweenness are most often similar (15/21 genres), while actor age-at-filming is least similar (0/21). - Relationship structures: Across all movies, triangle composition is heavily male: 0 females (three men) 40.74%, 1 female 36.56%, 2 females 19.14% (three women least common). Romance has more women-including triangles; War and Action are most male-dominated. Only 3.57% of interactions are among three women versus 40.74% among three men (reported headline figure). - Centrality of women: In 2018 releases, women comprised a median of 30% and mean of 33% of the top-10 central roles, with a clear upward trend over decades. Movies with top-10 roles are predominantly male; only five films had all top-10 roles female and were women-centric settings. - Bechdel classifier: Random Forest achieved AUC=0.81 and outperformed prior work (higher F1). Top features were gender-triangle-based (e.g., percent of triangles with two women, percent with zero women, females in top-10 roles, percent with three women). Trends show increasing average probability of passing the Bechdel test over time and varying by genre (e.g., historically low in War). - Network evaluation: Subs2Network overlapped with 628/773 ScriptNetwork titles. Compared to IMDb credit order, Subs2Network matched more top central characters than ScriptNetwork: Top-5 overlap 2.80 vs 2.70; Top-10 overlap 6.06 vs 5.35. Edge coverage between methods was similar (~65%). Versus Amazon X-Ray, node overlap was 79.6% and edge overlap 54.5%. Character detection for main characters by screen time reached up to 96.4%. - Alternative metric (Gender Degree Ratio): Average ratio across movies is ~0.6, indicating roughly 6 female interactions per 10 male interactions; only about 12% of movies pass the proposed fairness band. Examples: Resident Evil: Retribution (1.06) and The Hunger Games (0.94) score near parity; Madagascar (~0.20) and Batman Begins (~0.24) are highly imbalanced. The test differentiates feminist lists (higher ratios) and flags movies that pass Bechdel yet remain imbalanced, though it is context-insensitive.

Discussion

Findings show persistent gender disparities in both interaction structures and role centrality, though there is a steady improvement over time. Relationship triangles reveal strong male dominance across genres, with Romance relatively more balanced and Action/War most male-heavy. The growing share of women in top-10 central roles indicates progress, yet full parity remains rare. The Bechdel classifier captures fairness trends better than a binary label by producing calibrated probabilities and relying on structural gender features, but the traditional test can be satisfied trivially and misses centrality and stereotype context. The proposed Gender Degree Ratio addresses some Bechdel shortcomings by considering aggregate interaction balance across genders; it effectively distinguishes feminist and male-centric sets and exposes “false positives” of the Bechdel test. However, it does not incorporate narrative context or qualitative portrayal. Overall, the large-scale network approach provides robust, reproducible insights into where and how gender bias manifests in films, aligning with and extending prior qualitative and small-scale quantitative studies.

Conclusion

The study introduces Subs2Network to transform subtitles into character social networks at unprecedented scale, releasing 15,540 networks, code, and features for the community. Analyses across genres and decades demonstrate that a gender gap remains pervasive—e.g., male-majority interaction triangles are about 3.5 times more common than female-majority ones—yet female centrality and Bechdel-passing likelihood have risen over time. A machine-learning Bechdel classifier (AUC 0.81) provides automated, probabilistic assessment, and an alternative Gender Degree Ratio test better captures interaction balance. Future work includes extending to TV series, longitudinal analyses of actors’ and directors’ careers, incorporating additional fairness tests, and advancing methods (e.g., deep learning, coreference, improved NER) to capture context and reduce noise, yielding richer assessments of gender representation.

Limitations

- Data quality and coverage: Subtitle accuracy varies (spelling mistakes, inconsistencies); not all movies have high-quality or hearing-impaired subtitles. IMDb/TMDb cast lists include many unnamed minor roles (e.g., “Guard #2”) that are hard to map. - Entity matching challenges: Alias handling, partial names, and multiple characters sharing surnames cause ambiguity; superhero/epithet names (e.g., “Captain America”) complicate NER and filtering. - Network construction noise: False positives from mis-matches and multiple scenes within short time intervals; mitigation via edge weight threshold may also remove true weak ties. - Evaluation constraints: Perfect ground truth is infeasible at scale. Script-based networks differ from final films (drafts, added/removed/renamed characters), limiting comparability. Amazon X-Ray offers closer-to-ground-truth data but is not fully public and required manual extraction. - Metric limitations: The Gender Degree Ratio does not account for narrative context or qualitative stereotypes; Bechdel-based labels are noisy and debated, particularly for older films. - Selection of popular movies (≥200 votes) may bias results toward mainstream titles with better metadata.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Using big data to understand the online ecology of COVID-19 vaccination hesitancy

S. Teng, N. Jiang, et al.

Political Science

Using the president's tweets to understand political diversion in the age of social media

S. Lewandowsky, M. Jetter, et al.

Biology

Citizen science data reveals the need for keeping garden plant recommendations up-to-date to help pollinators

H. B. Anderson, A. Robinson, et al.

Computer Science

Using the interest theory of rights and Hohfeldian taxonomy to address a gap in machine learning methods for legal document analysis

A. Izzidien

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny