logo
ResearchBunny Logo
Computational appraisal of gender representativeness in popular movies

Sociology

Computational appraisal of gender representativeness in popular movies

A. Mazières, T. Menezes, et al.

This research conducted by Antoine Mazières, Telmo Menezes, and Camille Roth delves into automated methods for analyzing gender representation in blockbuster films, revealing significant trends over three decades. Despite confirming ongoing underrepresentation of women, the study highlights an encouraging shift towards more equitable portrayals across various genres and budgets.... show more
Introduction

The study situates itself within a long tradition of research on sex roles in mass media, historically relying on manual content analysis to document stereotypes, occupational roles, and portrayals of women and men. Prior reviews have consistently found under-representation and sexualization of women across media, though some null results appear in specific contexts. Methodologically, manual coding is hard to scale and often constrained to available metadata (e.g., IMDb). The authors propose leveraging advances in AI for automated processing of text, image, and video to construct datasets relevant to sex role research at scale. They focus on cinema, assembling a large corpus of popular films across more than three decades, and apply face detection and gender inference on sampled frames to quantify on-screen presence. They emphasize evaluating and correcting for algorithmic bias, propose a metric (female face ratio, FFR), compare it to the Bechdel test, and analyze temporal trends and framing asymmetries. The purpose is to demonstrate that computational methods can uncover historical trends and support more ambitious research questions that are otherwise difficult to address with manual approaches.

Literature Review

The paper reviews scholarship on gender roles in media, noting enduring findings of women’s under-representation and sexualization (Busby, 1975; Collins, 2011; Rudy et al., 2010), and large-scale studies on TV and films (Lauzen, 2018; 2019; Smith et al., 2019; Townsend et al., 2019). It highlights the limits of manual content analysis and metadata-based approaches (Lindner et al., 2015; Yang et al., 2020). The authors connect to emerging computational visual analysis frameworks such as “distant viewing” and “computational media intelligence” (Arnold & Tilton, 2019; Somandepalli et al., 2021). Related technical works include measuring on-screen presence and speaking time (Guha et al., 2015a; Somandepalli et al., 2021) and object associations by gender (Jang et al., 2019). The paper also discusses algorithmic accuracy and bias, referencing dataset and fairness concerns (ImageNet; Deng et al., 2009; Buolamwini & Gebru, 2018; Crawford & Paglen, 2019), and domain-specific performance variability (McBee et al., 2018; Zech et al., 2018).

Methodology

Corpus selection: The authors compiled films present on two user-driven platforms: YIFY (yts.mx) and IMDb (imdb.com). From 13,662 YIFY movies (≥3 seeders, Dec 2019), they linked to IMDb, excluded documentaries and animation, and required availability of year, genres, user rating, parental rating, runtime, budget, and worldwide gross. For temporal analysis, they focused on 1985–2019 due to sparse earlier years, yielding 3,776 films (avg runtime 109±18 min). Budgets: median $23M (Q1 $10M, Q3 $45M). Worldwide gross: median $43M (Q1 $11M, Q3 $122M). Frame sampling: To ensure representativeness across varying shot durations and editing pace, they extracted one frame every 2 seconds, producing >12.4 million images. Automated detection: Using Wolfram Mathematica Engine 12, they applied face detection and binary gender inference per frame. They detected ~10 million faces across >6.6 million images (avg 2,596±1,090 faces/movie). Bounding boxes provided face positions and areas relative to the frame. Human validation and bias estimation: They randomly selected 1,000 frames (one face detected per frame; balanced by inferred gender: 500 female, 500 male). A web interface asked human raters two questions: (1) whether the framed item is a face and its gender; (2) whether other faces are present outside the box. 4,938 reviews were collected (mean 4.94±2.29 per frame); majority vote per frame was used. Confusion matrices:

  • Face detection: TP=977, FP=23, FN=137, TN=863 (accuracy 92%). Errors skewed toward missed detections (FN>FP).
  • Gender inference: For model-inferred Female: Human Female 304, Male 162, Doubt 18, No face 16. For model-inferred Male: Human Female 75, Male 410, Doubt 8, No face 7. Overall accuracy 73.9%. Precision: Female 65%; Male 84.5%. The model tends to over-predict female relative to truth. Bias correction: Knowing λ (TP rate for male) and λ′ (TP rate for female), they corrected counts by reallocating predicted sexes proportionally to the confusion matrix. Corrected FFR is computed as: FFR_corrected = (1 − λ) + (λ + λ′ − 1) × FFR. They observed time-variation in errors and used period-specific λ, λ′ for temporal analyses. Measures: Primary metric FFR = proportion of faces classified as female among all detected faces per movie. They compared FFR with Bechdel test outcomes (bechdeltest.com; n=2,454 overlapping films) across top 10 genres (Spearman > 0.93 for ordering across genres). Temporal analysis used four quartile periods (equal number of films), examining FFR distributions and Bechdel pass rates. Framing analyses (2014–2019 subset): Given higher and more symmetric gender detection accuracy (~78%) in recent years, they examined:
  • Face-ism proxy: face bounding box area as percent of frame; compared male vs. female medians; Mann-Whitney U test for differences.
  • Mise-en-scène/cadre: frequency of gender combinations per frame (e.g., 0F/1M, 1F/1M, 0F/2M), and face position distributions on a 3×3 rule-of-thirds grid. Chi-square tests assessed dependence among position distributions across configurations; also aggregated horizontal/vertical thirds. Audience relations: For 2014–2019 FFR histogram, they overlaid rankings by budget, worldwide gross, IMDb rating value, rating count, and proportion of female ratings, using grayscale ranks to compare alignment with FFR bins.
Key Findings
  • Overall under-representation: Mean FFR across all movies is 34.52% (σ=9.19), comparable to prior studies on TV and film speaking characters. FFR varies by genre (e.g., Crime ~31.3%, Romance ~37.1%).
  • Examples: Low FFR (<25%): Pirates of the Caribbean (2007), Star Wars (2005), The Matrix (2003), Independence Day (1996), Forrest Gump (1994) (~23%). Near parity (45–55%): The Hunger Games (2014), Jurassic World (2015), Rogue One (2016), Gravity (2013). Highest FFR: Bad Moms (2016) at 68%; also Sisters (2015), Life of the Party (2018), Cake (2014).
  • Correlation with Bechdel: Across top genres, FFR ordering aligns closely with Bechdel pass rates (Spearman > 0.93). Variation in FFR across genres is smaller in absolute values than Bechdel proportions but coherent in ordering.
  • Temporal trend: Average FFR increases from ~27% (1985–1998) to 44.9% (2014–2019), approaching parity. FFR ranges broaden: earlier films mostly 20–45%; recent films 35–65%. Standard deviation rises (from 5.1 to 7.6), suggesting greater diversity. Bechdel pass rates for overlapping films rise from 51% (1985–1998) to 60% (2014–2019) (+9%), tracking but smaller in magnitude than FFR increase (+18%).
  • Audience and funding alignment (2014–2019): Highest budgets, grosses, and ratings cluster near the main FFR mode (~35%), reflecting the average under-representation. Elevated FFR (~60%) also includes relatively higher-budget and successful films. Proportion of IMDb ratings by women aligns strongly with higher-FFR bins (near-perfect ordering), indicating stronger female engagement with high-FFR films.
  • Single-face bias: Frames with only one face (the most common case) show a stronger male skew than overall FFR: 40% female vs. 60% male (vs. period average FFR 44.9%). Ordered frequencies of gender combinations are symmetric but consistently favor male-dominant configurations, reflecting the general 45–55 female–male imbalance.
  • Face-ism proxy: Median face area is 3.8% of frame for both genders; male median exceeds female by only 0.03%. Differences are statistically significant but extremely small and fluctuate by genre around zero, providing little evidence of systematic gender bias in facial prominence under this measure.
  • Mise-en-cadre differences: In mixed-gender frames, women appear more often in the middle third, men more often in the upper third. Effects are statistically significant across pairwise configuration comparisons (chi-square p<0.005), though magnitudes are small. Manual checks suggest part of the vertical placement difference arises from typical height differences between actors and actresses.
  • Algorithm performance: Face detection accuracy 92% (TP 977, FP 23, FN 137, TN 863). Gender inference overall accuracy 73.9%; precision for Female 65%, for Male 84.5%. Raw outputs overestimate female faces; period-specific confusion-based corrections were applied.
Discussion

The study demonstrates that automated large-scale visual analysis can quantify on-screen gender presence and reveal historical trends. The marked increase in FFR toward near parity since 1985 suggests meaningful changes in popular filmmaking practices regarding women’s on-screen visibility. The correlation between FFR and Bechdel outcomes indicates that FFR captures some semantic aspects of gender representation beyond mere counts. However, discrepancies with prior literature reporting stable under-representation may stem from differences in corpus selection (user-contributed, broader-than-top-grossing sets) and from the distinction between on-screen presence versus speaking time or narrative centrality. Audience and budget analyses show that mainstream success tends to cluster around the average under-representation level (~35% FFR), with some recent higher-FFR films achieving strong funding and performance. Framing analyses reveal minimal differences in facial prominence but subtle, statistically significant positioning asymmetries in mixed-gender scenes, partly attributable to height differences. These findings address the research question by establishing scalable, bias-corrected metrics of on-screen presence, tracing temporal evolution, and probing compositional biases, while highlighting where presence does not necessarily equate to influence or agency.

Conclusion

The paper contributes a scalable, computational methodology to appraise gender representativeness in popular films using automated face detection and gender inference, coupled with empirical bias assessment and correction. It documents a substantial temporal increase in women’s on-screen presence (FFR) since 1985, with recent years approaching parity, and shows coherence with Bechdel-based assessments across genres. It uncovers subtle framing asymmetries with small effect sizes and connects representational measures to audience engagement and funding. The approach is readily reproducible and extendable to other visual media (TV, ads). Future research should: combine large-scale automated measures with fine-grained qualitative analyses to assess narrative agency; incorporate speech detection and character identification to measure speaking time and centrality; improve body detection to better evaluate face-ism; and continue rigorous, context-specific evaluation and debiasing of machine learning models.

Limitations
  • Algorithmic bias: Gender classification exhibited asymmetric precision (Female 65%, Male 84.5%) and time-varying errors; corrections rely on confusion matrices and may not capture all context-specific biases (e.g., lighting, makeup, ethnicity).
  • Metric scope: FFR measures on-screen presence, not speaking time, narrative centrality, or portrayal quality; increases in FFR may not translate to improved agency or reduced stereotyping.
  • Framing proxies: Face-ism measured via face area lacks body detection and thus may miss body–face composition nuances.
  • Temporal restriction for framing analyses: Detailed mise-en-cadre and face-ism analyses were limited to 2014–2019 due to more symmetric and higher gender detection accuracy.
  • Sampling/corpus biases: Corpus built from YIFY and IMDb contributions may skew toward younger or more online-engaged audiences and differs from top-grossing-only samples.
  • Frame sampling: Uniform 2-second sampling may not fully align with narrative beats or shot salience; however, it improves representativeness over keyframe-only methods.
  • Generalizability: Results pertain to the selected corpus and period; cross-cultural and non-English-language representation nuances are not explicitly addressed.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny