The film industry significantly influences societal beliefs and opinions. Blockbuster movies, designed to appeal to mass audiences, play a crucial role in shaping perceptions of gender roles, potentially reinforcing existing stereotypes. This study investigates gender bias in blockbuster movies by analyzing the emotions expressed by male and female characters using natural language processing (NLP) techniques. The researchers selected approximately 30 English-language blockbuster films from IMDb, focusing on a comparative analysis of character portrayals based on statistical distribution across time periods and the sentiment and emotions expressed in their dialogues. The goal is to present an approach for understanding and promoting studies on gender inequality in films, ultimately contributing to addressing this global issue and promoting positive social change through NLP.
Literature Review
Existing research on gender bias in media has employed various methods, including the Bechdel Test, which assesses female representation but has limitations in addressing stereotypes and capturing nuanced portrayals. Studies have used word embedding techniques to analyze gender stereotypes, but often focus solely on positive/negative affect. Kagan et al. (2020) analyzed movie social networks, finding a gender gap across genres, though noting improvement over time. Xu et al. (2019) identified emotional dependence of female characters on male characters, termed the Cinderella complex. Yu et al. (2017) analyzed emotions in Korean thriller scripts using manual annotation and sentiment analysis tools, comparing results with Plutchik's emotion wheel. Anikina (2017) used machine learning classifiers for emotion detection in movie dialogues, highlighting challenges related to data annotation. This study addresses limitations of previous work by expanding analysis to encompass sentiment analysis and embedded emotions beyond positive/negative affect, using Plutchik's emotion wheel to create a novel approach for analyzing dialogues.
Methodology
The study comprises three modules: data processing, emotion recognition, and analysis. The data processing module converts movie scripts (PDF to HTML) and extracts dialogues for each character, creating character dictionaries. Data cleaning includes removing extra text, tagging gender, adding movie details, and dropping characters with fewer than five dialogues. The emotion recognition module employs Stanza (a Python NLP package) for sentiment analysis (positive, negative, neutral), but due to limitations, NRCLex (a rule-based emotion detection model) is used for primary emotion scores (fear, anger, trust, surprise, sadness, disgust, joy, anticipation). Plutchik's wheel of emotions is used to compute 24 secondary emotions. Each dialogue is represented as a 32-dimensional vector (8 primary + 24 secondary emotions). The analysis module uses statistical tests (Mann-Whitney U-test) and machine learning techniques (clustering, classification). T-tests and clustering are employed to identify potential biases and compare male and female character emotions. A qualitative analysis of dialogues is conducted using word clouds to understand commonly used words by each gender. Hierarchical clustering and k-means clustering are utilized to analyze potential bias in the distribution of male and female characters across different character types.
Key Findings
The analysis of 34 blockbuster movies revealed that male characters exhibit higher scores in emotions such as aggressiveness and dominance, while female characters show higher scores in joy. Box plots visually illustrate these differences. t-SNE plots reveal that female characters are more clustered in terms of emotions, suggesting less diverse character portrayals compared to males. Word cloud analysis shows that female character dialogues frequently include words associated with domesticity and appearance (kitchen, fashion, dress, skirt), while male characters use words related to ambition and broader societal concerns (time, business, war, world). Clustering results, using both hierarchical and k-means methods, show an uneven distribution of male and female characters across different character clusters, indicating a significant bias in how characters are written and categorized.
Discussion
The findings confirm the presence of gender bias in blockbuster movies, despite improvements over time in female representation. The consistent patterns of emotional portrayal highlight implicit biases that reinforce societal stereotypes. While the overall number of women in films might increase, the emotional range and diversity of character portrayals remain skewed, implying that simply having more female characters is insufficient to address the underlying biases. The uneven distribution of male and female characters across different clusters underscores the need for a conscious effort to create more diverse and nuanced female characters.
Conclusion
This study demonstrates the presence of implicit gender bias in the representation of characters in blockbuster movies, despite some progress over time. The analysis of emotions, combined with clustering techniques, reveals consistent patterns reflecting societal stereotypes. Future research could explore these biases in other forms of media such as music and conceptual art, further expanding understanding of how media contributes to shaping gender perceptions.
Limitations
The study's sample size, while substantial, might not fully capture the diversity of the global film industry. The reliance on readily available scripts might introduce bias towards movies with publicly accessible scripts. The accuracy of emotion detection tools, while relatively high, might still introduce some degree of error.
Related Publications
Explore these studies to deepen your understanding of the subject.