logo
ResearchBunny Logo
Identifying gender bias in blockbuster movies through the lens of machine learning

The Arts

Identifying gender bias in blockbuster movies through the lens of machine learning

M. J. Haris, A. Upreti, et al.

This innovative study by Muhammad Junaid Haris, Aanchal Upreti, Melih Kurtaran, Filip Ginter, Sebastien Lafond, and Sepinoud Azimi explores gender bias in English blockbuster movies using advanced natural language processing. The authors shed light on how male and female characters are portrayed through emotions, revealing surprising dominance and envy in men, alongside joy in women. Their unique method encourages reflection on gender equality while facilitating automated movie analysis.

00:00
00:00
~3 min • Beginner • English
Introduction
The study investigates whether blockbuster English-language films encode gender bias in the emotional portrayal of male and female characters. Motivated by the influence of movies on public perceptions and the potential reinforcement of gender stereotypes, the authors analyse approximately thirty top-rated films from IMDb to compare gender distributions over time and the sentiments and emotions expressed in character dialogues. The purpose is to move beyond coarse sentiment polarity to a richer, emotion-based representation, thereby offering an automated, reproducible approach to understanding gender inequality in media and informing efforts to reduce bias.
Literature Review
Prior work has examined gender gaps across domains and within film using tools like the Bechdel Test, with findings of persistent gender disparities yet gradual improvements in female centrality over time (Kagan et al., 2020). Xu et al. (2019) identified a Cinderella complex in books and films using word embeddings, showing female vectors oriented toward romance and male toward adventure. Studies on emotion in scripts have used Plutchik’s taxonomy; Yu et al. (2017) manually annotated Korean thriller scripts and compared with VADER, finding better alignment for anger and fear. Anikina (2017) combined machine learning and NRC lexicons for emotion detection, but noted challenges due to limited annotated resources in the movie domain and the difficulty of multi-label emotion classification. The Bechdel Test has limitations in capturing stereotypes and may fail even for female-oriented films. This study addresses these gaps by combining sentiment and multi-emotion analysis, moving beyond simple polarity and introducing a dialogue-to-emotion array grounded in Plutchik’s wheel.
Methodology
The workflow comprises three modules: (1) data processing, (2) emotion recognition, and (3) analysis. Data processing: Scripts for 34 highly popular, globally influential English-language movies (1972–2021) across genres (romance, fantasy, fiction, drama, action) were collected from IMDb top lists, prioritizing script availability and processing compatibility. PDFs were converted to HTML, then parsed with BeautifulSoup and regex using indentation (style attributes 'left' and 'top') to segment scene elements and extract character-speaker-dialogue pairs into character dictionaries. Data were stored in JSON as {character: [dialogues]}. Character names were cleaned of screenplay annotations (e.g., V.O., O.S.), genders were tagged, and metadata (movie name, year) were added. Characters with fewer than five dialogues were dropped. The final dataset contained 26,279 dialogues and 457 characters (118 female, 339 male). Emotion recognition: Sentiment analysis used Stanza to label each dialogue as positive, negative, or neutral; however, over 70% were neutral, limiting discrimination between genders. Validity was checked by manual annotation of 179 Harry Potter dialogues by two annotators, yielding Stanza–annotator agreement around 0.71–0.78 and inter-annotator agreement around 0.74 (as reported in the accuracy table). To capture richer affect, NRCLex was applied to each dialogue to obtain scores for eight primary emotions (anger, fear, sadness, disgust, joy, anticipation, trust, surprise). Dialogues were represented as normalized 8D emotion embeddings. Using Plutchik’s wheel, 24 secondary emotions were computed by averaging relevant primary pairs (e.g., envy from sadness and anger), producing a 32-dimensional embedding (8 primary + 24 secondary) per dialogue. Analysis: Statistical tests (including Mann–Whitney U-test) assessed distributional differences between genders. Visualizations included box plots and t-SNE for dimensionality reduction to examine clustering patterns by gender. Unsupervised clustering (hierarchical and k-means) grouped characters by emotional similarity; gender ratios within clusters were compared to the overall sample ratio (approximately 3:1 male:female) to detect bias. Word clouds of non-overlapping nouns in male vs female dialogues highlighted thematic differences. The approach is modular, with loosely coupled components.
Key Findings
- Dataset: 34 films; 26,279 dialogues; 457 characters (118 female, 339 male). - Sentiment polarity from Stanza showed >70% neutral classifications, offering limited differentiation between genders; Stanza’s agreement with human annotators was ~0.71–0.78, with inter-annotator agreement ~0.74. - NRCLex-based emotion embeddings revealed gendered patterns: males scored higher on aggressiveness and dominance, while females scored higher on joy. - Positive vs negative aggregate emotion proportions were not drastically different between genders, motivating analysis of specific emotions. - t-SNE visualization showed female characters clustered centrally with less dispersion, suggesting narrower emotional diversity, while male characters were more sparsely distributed, indicating more diverse emotional portrayals. - Word-cloud analysis of non-overlapping nouns indicated females more often used terms like kitchen, fashion, dress, skirt, sweetheart, madam, while males used time, business, war, world, man, home, reflecting stereotypical role portrayals. - Clustering (hierarchical and k-means) produced groups with uneven gender ratios relative to the overall 3:1 baseline; some clusters exhibited imbalances up to approximately 6:1, indicating implicit bias in how character types are written. - Overall, men were portrayed as more dominant and envious; women as more optimistic and joyful, aligning with societal stereotypes.
Discussion
The findings support the hypothesis that blockbuster movie scripts encode gender bias in emotional portrayals. While coarse sentiment polarity is similar across genders, emotion-level analysis uncovers systematic differences aligned with stereotypes: males display higher dominance and aggressiveness, females higher joy and optimism. The t-SNE patterns and clustering imbalance suggest that female characters are written with less emotional diversity and more homogeneity, whereas male characters are afforded a broader range of emotional traits. The lexical themes in dialogues further reinforce stereotypical roles (domestic/clothing topics for women; business/war/world for men). These results indicate that apparent improvements in female representation may coexist with persistent, implicit emotional biases, highlighting the importance of multi-emotion analysis beyond binary sentiment to reveal nuanced forms of inequality in media narratives.
Conclusion
This study introduces an emotion-embedding approach grounded in Plutchik’s taxonomy to analyze gender representation in blockbuster movie scripts. By combining sentiment analysis with primary and secondary emotion scores and applying statistical tests and clustering, the work reveals implicit biases: men are portrayed as more dominant and envious, women as more joyful and optimistic, with female characters showing less emotional diversity. These findings suggest scriptwriters and producers should consciously examine character portrayals to respect individual variability independent of gender. The approach provides a scalable, automated alternative to manual analyses and moves beyond limitations of the Bechdel Test and simple polarity measures. Future research could extend the methodology to other cultural products such as music and conceptual art to investigate broader patterns of implicit gender bias.
Limitations
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny