logo
ResearchBunny Logo
Ethnic Representation Analysis of Commercial Movie Posters

The Arts

Ethnic Representation Analysis of Commercial Movie Posters

D. Kagan, M. Levy, et al.

This study by Dima Kagan, Mor Levy, Michael Fire, and Galit Fuhrmann Alpert delves into ethnic bias in commercial movie posters using deep learning to analyze nearly 125,000 posters. A positive trend is uncovered, showing that recent English-speaking films are increasingly reflecting the ethnic diversity of the actual US population. The authors suggest a groundbreaking automated method for monitoring this vital aspect in the film industry.... show more
Introduction

The film industry’s advertising through posters plays a critical role in shaping cultural and societal perceptions, often presenting a reality perceived as authentic. With rising global attention to diversity, including movements like Black Lives Matter, this study investigates ethnic representation on movie posters as a proxy for industry bias and change over time. The authors hypothesize: (1) minority representation on posters has improved over time but began from underrepresentation, (2) poster design aspects (face size and location) are influenced by actor ethnicity, with minorities depicted smaller and less centrally, (3) the ethnicity of the leading (largest) actor affects the ethnic composition of other depicted actors, (4) representation varies by genre, and (5) in recent years, minorities may have been added to posters to mitigate bias. To test these, the study analyzes a large corpus of posters with automated computer vision methods and links results to US population demographics.

Literature Review

Prior work documents persistent under-representation of minorities in film, both on screen and behind the camera. Smith et al. (2014) reported that only 25.9% of speaking characters in 600 popular films (2007–2013) were from minority groups despite minorities comprising 37% of the US population. Studies highlight stereotype reinforcement (e.g., hypersexualization of Hispanic females) and institutional barriers, including limited minority leadership and decision-making roles (Hennekam & Syed, 2018; Smith et al., 2020). Netflix’s US scripted content showed progress toward proportional representation in leads/co-leads by 2019 (Smith et al., 2021), though gains were uneven across groups. Poster-focused research has typically been small-scale and manual, exploring gender and power portrayals (Aley & Hahn, 2020; Gabriel, 2012) and specific ethnic narratives (Freire, 2019; Rahmasari, 2014). Broader advertising studies show complex audience effects and potential bias reinforcement (Johnny & Mitchell, 2006; De Run, 2005; Baumgartner & Laghi, 2012). Machine learning enables large-scale analysis of images, but face/attribute datasets are often biased toward lighter skin, raising fairness concerns (Merler et al., 2019; Mehrabi et al., 2021). This motivates careful model evaluation when studying diversity using automated methods.

Methodology

Data sources and curation: Movies and metadata were collected from IMDb (open datasets for titles, genres, ratings) and TMDB (posters, actor images, country data). From IMDb’s catalog, non-animated movies with ≥1,000 ratings were retained, yielding ~35,000 movies (17,187 English-speaking used for analysis). For each movie, all official TMDB posters plus the IMDb main poster were retrieved via APIs, initially totaling 286,654 posters. Duplicate/similar posters per movie were removed using dhash with a Hamming distance threshold <16, resulting in 125,439 non-duplicate posters (72,971 for English-speaking movies). Actor dataset: For each movie’s cast (via IMDb), actor IDs, names, and cast rank were collected. Up to three profile photos per actor were downloaded from IMDb/TMDB, forming 118,136 actors with 217,575 images. Grayscale images were filtered (channel MSE criterion), leaving 101,873 actors and 179,858 images. US demographics: Census data (five categories) were used for normalization; Indian was merged under Asian per census definitions. Feature extraction pipeline: (1) Poster face detection: RetinaFace detected faces on posters; posters without faces were discarded. This yielded 77,192 posters with faces (45,613 English-speaking). (2) Actor headshot face detection and embedding: RetinaFace detected faces in actor photos; ArcFace generated embeddings. (3) Actor ethnicity classification: Instead of classifying poster faces directly (colorization/angles/artistic effects), the method first identified the actors and then assigned ethnicity using FairFace models (4-class: Asian, Black, Indian, White; 7-class: White, Black, Latino-Hispanic, East Asian, Southeast Asian, Indian, Middle Eastern). For each actor, per-image class probabilities were averaged across up to three photos, and the maximum average probability determined the assigned ethnicity. (4) Poster-actor matching: Poster face embeddings were matched to actor embeddings by nearest neighbor (Euclidean distance). Two approaches were evaluated: comparing to (a) the entire cast list and (b) top-10 cast only. Model evaluation: Face detection was validated by manual inspection of >100 faces on random posters (RetinaFace achieved 100% detection on the sample). Face recognition was validated on 50 posters (149 faces) by manual verification of matched pairs; both cast-matching approaches yielded 100% verification, with identification rates of 71% (whole cast) vs 69% (top-10). Ethnicity classification was assessed via manually labeled samples: the 4-class model achieved average precision 92.85% (per-class precision: Asian 100%, Black 100%, Indian 100%, White 71.42%; recall: Asian 100%, Black 90%, Indian 70%, White 100%); the 7-class model achieved average precision 61.37% (per-class precision: Black 100%, East Asian 60%, Southeast Asian 66.66%, White 55.55%, Indian 66.66%, Latino-Hispanic 30.76%, Middle Eastern 50%; recalls varied 20–100%). Analysis scope: More than 45,000 posters from 24,062 English-speaking US movies (1960–2021) were analyzed for trends in representation over time, face size and position, co-appearance conditioned on the largest face’s race, genre-specific distributions, and representation by cast rank positions.

Key Findings
  • Persistent but decreasing White dominance: Across posters, White actors remain overrepresented; overall, about 79% of depicted actors were White, a 1.14x factor above their share in the US population. However, minority representation has increased steadily since the 1960s. By 2020–2021, poster representation for English-speaking films was almost perfectly balanced relative to US population composition, and posters from the last two years reached near-perfect balance. - Leading roles vs minor roles: Non-White actors are more likely to appear in minor roles on posters. Among the top-3 cast-listed actors shown on posters, only 9.2% were minorities; in ranks 4–12, 16.2% were non-White. - Visual prominence: On average, White actors’ faces are 25% larger relative to the largest face on the poster than those of other races. White actors also tend to be closer to poster centers, though center-distance differences have narrowed substantially in recent decades. - Number of actors on poster: Minorities have a higher likelihood of appearing on posters that feature many actors (e.g., more than six). - Conditional co-appearance: Regardless of the largest actor’s race, White actors have the highest probability of being the second-largest face. When the largest actor is non-White, the next most probable race among other faces is the same as the largest face’s race (homophily). - Genre effects: White actors are the most frequent across all genres, with especially high shares in Film-Noir (~98%), Western (~94%), and Mystery (~93%). The highest shares of Black actors are in Sports, Music, Action, Crime, and Documentary. Asian actors are relatively more represented in Action. Indian and Asian categories have generally low shares (max ~9%). Documentary appears most diverse among genres. - Cast rank patterns: The probability of appearing on a poster decreases nearly exponentially with increasing cast rank. White representation decreases with rank, while Black representation increases with rank, indicating Whites are more likely in higher-ranked (lead) positions. - Model performance: RetinaFace detection achieved 100% detection on the sampled posters; poster-to-actor matching had 100% manual verification with identification 71% (whole cast) vs 69% (top-10). The FairFace 4-class ethnicity model outperformed the 7-class model (average precision 92.85% vs 61.37%).
Discussion

The findings confirm initial hypotheses: minority representation on posters has improved markedly over time, approaching demographic parity in recent years for English-speaking US movies, yet White actors maintain visual and positional prominence suggestive of residual bias in leading roles and poster design. The larger and more central depictions of White actors likely reflect continued concentration of lead roles among White actors. Increased minority presence, particularly Black and Asian actors in the last decade, coincides with heightened diversity awareness (e.g., BLM) and industry globalization (e.g., targeting Chinese markets), though normalization by US demographics shows broad improvements across minorities rather than targeted over-representation. The higher likelihood of minority appearance on ensemble posters (many actors) may reflect franchise-driven ensemble casts and/or strategic inclusion to mitigate criticism. Homophily patterns—greater likelihood of co-appearance from the same ethnicity when the largest face is non-White—suggest genre/plot-driven casting or targeted marketing. Genre analyses indicate persistent stereotypes (e.g., Asians in action, Blacks in crime), while documentaries exhibit the highest diversity, likely due to constraints of real-world subjects. Overall, automated poster analysis reveals how visual marketing mirrors and shapes casting hierarchies and inclusion, answering the posed research questions across time, design, co-appearance, genre, and recent inclusion dynamics.

Conclusion

This work introduces the first large-scale, computer vision-based framework and dataset for analyzing ethnic representation in movie posters, comprising 125,439 non-duplicate posters and linked actor identities and ethnicities. Results show that White actors are, on average, larger and more centrally placed, minorities increasingly appear on posters (notably Black and Asian actors), and in the past two years, English-speaking movie posters nearly match US ethnic demographics. The main character’s race influences the ethnicity of other depicted actors. The open-source dataset and tools enable continuous, automated monitoring of diversity in film marketing. Future directions include comparative analyses of English vs non-English markets and cross-country poster designs, examining links between decision-makers’ identities and poster content, developing a poster-based centrality measure of cast rank, and identifying mismatches where lower-ranked actors receive disproportionate poster prominence (potentially indicating compensatory representation).

Limitations
  • Dependence on pre-trained models: Ethnicity classification and face recognition are limited by biases and performance of existing models (e.g., FairFace), especially for darker skin tones and fine-grained ethnic categories; the 7-class model showed substantially lower precision. - Data constraints for older films: Older black-and-white posters and limited non-White samples reduce classification reliability. - Image quality: Poster resolution and small face sizes hinder accurate detection/recognition; super-resolution could help but requires further validation. - Poster availability/variation: The number of poster variants per movie and their print/distribution volumes are unknown, potentially biasing representation measures toward available digital versions rather than audience exposure. - Demographic normalization: Analyses rely on US census categories and lack comprehensive worldwide ethnic demographic data for broader market comparisons. - Cast list ordering noise: IMDb cast ranking inconsistencies in some titles may affect rank-based analyses, though matching accuracy remained high.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny