logo
ResearchBunny Logo
Online images amplify gender bias

Social Work

Online images amplify gender bias

D. Guilbeault, S. Delecourt, et al.

This study conducted by Douglas Guilbeault, Solène Delecourt, Tasker Hull, Bhargav Srinivasa Desikan, Mark Chu, and Ethan Nadler examines how online images amplify gender bias, revealing that bias is more prominent in visuals than text. The research sheds light on the pressing need to tackle the societal implications of this shift to visual communication for a fair and inclusive internet.... show more
Introduction

Images increasingly pervade online information and daily communication, with search engines and social platforms shifting toward visual content as reading time declines. Historical and psychological perspectives suggest images can strongly reinforce social biases due to their memorability, emotional salience and direct transmission of demographic cues. Unlike text, which can omit gendered references, images often make gender salient in depictions of social categories, potentially amplifying bias. Despite the growth of online images, most quantitative work on online gender bias has focused on text, leaving open whether and how images differ from text in the prevalence and psychological impact of gender bias. This study tests the hypothesis that online images amplify gender bias relative to text, both statistically and in their influence on people’s beliefs.

Literature Review

Prior research has documented human-like biases in textual corpora and word embeddings derived from large-scale online text (e.g., Caliskan et al., Garg et al., Charlesworth et al.). A smaller set of studies examined gender bias in Google Images using limited occupational samples, without systematically comparing images to text or assessing psychological impact. Psychological literature on the picture superiority effect indicates that images are more memorable and emotionally evocative than text and may underpin text comprehension, suggesting a potent channel for bias transmission. Additionally, images inherently convey demographic cues (e.g., perceived gender), whereas text can employ gender-neutral phrasing, implying higher salience of gender in images. These strands collectively motivate examining multimodal (image vs text) differences in bias prevalence and impact.

Methodology

Observational analyses: The authors assembled 3,495 social categories from WordNet (occupations and social roles). For each category, they collected the top 100 Google Images results in August 2020 using fresh Google accounts and 10 servers in New York City (total 349,500 images), with replications using IPs in Amsterdam, Bangalore, Frankfurt, Singapore and Toronto. Additional robustness samples included gender-specific Google searches (e.g., “female doctor”), yielding 491,169 extra images. Wikipedia images were obtained via the WIT dataset for 1,523 categories; celebrity images were analyzed using the IMDb–Wikipedia face dataset (511,946 images). Faces were automatically detected and cropped (OpenCV). A total of 6,392 US-based, English-fluent MTurk coders labeled face gender as female/male/non-binary; the modal label across three coders determined perceived gender (non-binary responses comprised 2% and were excluded). Intercoder agreement was high (91% unanimous; Gwet’s AC = 0.48). For each category, the gender association in images was computed as a normalized score from −1 (100% female) to 1 (100% male), with 0 indicating parity. Text analyses: The study measured text-based gender associations using word embeddings, primarily the Google News word2vec model (2013; >100B words), supplemented by a retrained word2vec model on 2.7M news articles from 2021–2023, and by other embeddings (GloVe, BERT, FastText, ConceptNet, GPT-3). Following Kozlowski et al., they constructed a gender dimension from clusters of gendered terms (female: woman, her, she, female, girl; male: man, his, he, male, boy). Each category’s position on the −1 (female) to 1 (male) axis was derived from differences in average cosine distances to the female vs male clusters. To align ranges with image-based measures, min–max normalization was applied within male- and female-skewed subsets and combined to span −1 to 1. Of the WordNet categories, 2,986 had embeddings and were used for paired image-text comparisons. Comparative benchmarks: The authors compared image- and text-based gender associations with (a) public opinion: 2,500 US MTurk raters provided category-level gender association ratings on a −1 to 1 slider (20 raters per category), and (b) US Bureau of Labor Statistics 2019 occupational gender composition (n = 685 matched occupations). Experimental study: A preregistered online experiment recruited a nationally representative US sample via Prolific (target n=600; 575 completed; primary analyses n=423 across Image, Text (Google News), and Control conditions). Participants were randomized to search Google Images for occupation images (Image), Google News for textual descriptions (Text), or to upload descriptions of unrelated categories (Control). Each participant completed 22 trials, each involving a randomly selected occupation from a set of 54 spanning STEM and arts. After each upload, participants rated the occupation’s gender association on the same −1 to 1 scale (explicit ratings). Independent annotators labeled uploaded images/texts as male/female/neutral to quantify the gender association of stimuli encountered. Participants then completed an Implicit Association Test (IAT) measuring implicit associations of male–science vs female–liberal arts; D scores were computed using standard pooled SD procedures. Immediate and 3-day follow-up IATs were administered in preregistered work. Analyses included Wilcoxon tests, t-tests, correlations, and robustness checks controlling for category features, search frequency, number of faces/images, image ranking, face-cropping, duplicates, and media type; cross-model replications were performed.

Key Findings

Observational correlations and prevalence: Image- and text-based category gender associations were positively correlated (r = 0.5, P < 0.0001; n = 2,986). However, associations were significantly more extreme in images for both female-skewed and male-skewed categories (P < 0.0001, Wilcoxon signed-rank; n = 2,986). Underrepresentation of women: Average male bias was weak in text (μ = 0.03, P < 0.0001) but over four times stronger in images (μ = 0.14, P < 0.0001); mean difference = 0.11 (P < 0.0001; n = 2,986). Proportion male-skewed: 56% in text vs 62% in images (P < 0.0001). The inequality persisted with a deep learning gender classifier and under gender-specific searches. Public opinion and census benchmarks: Relative to human judgements, texts underestimated male bias by −0.084 (P < 0.001), whereas images overestimated it by 0.025 (P < 0.001) (n = 2,986). For 2019 US census occupations (n = 685), text-based associations were neutral (p = 0, P = 0.65) and less male than census (census μ = 0.08, P < 0.001) and images (images μ = 0.15, P < 0.001). Images showed significantly greater male bias than census for the same occupations (mean difference = 0.07, P < 0.001). Experimental effects on explicit beliefs: Participants’ uploaded descriptions were more gendered in the Image vs Text condition (mean difference = 0.42, P < 0.0001). Exposure to images increased the absolute strength of explicit gender associations vs Text (mean difference = 0.06, P < 0.001) and Control (mean difference = 0.06, P < 0.001); Text and Control did not differ (P = 0.56). Example: “model” was rated more strongly female in Image (μ = −0.62) than Text (μ = −0.32). The gender associations in participants’ uploads strongly predicted their explicit ratings across occupations (r = 0.79, P < 0.0001), as did the strength of associations (r = 0.56, P < 0.0001). Holding prevalence constant (only gendered uploads), images primed stronger explicit bias than texts: uploads gendered as image vs text led to higher reported bias (μ = 0.41 vs 0.35; mean difference = 0.06; t = 4.58; P < 0.0001; n = 54 occupations). Experimental effects on implicit bias: All conditions showed significant implicit bias associating men with science and women with liberal arts (P < 0.0001). Image vs Control exhibited higher implicit bias (mean difference = 0.11, P = 0.005). Text vs Control was not significant (mean difference = 0.06, P = 0.24); Image vs Text did not reach conventional significance (mean difference = 0.05, P = 0.09). Stronger explicit bias correlated with stronger implicit bias across participants (P < 0.0001), and only the Image condition showed significantly stronger implicit bias than Control three days later, suggesting enduring effects. Replications and robustness: Findings replicated across Wikipedia and IMDb image datasets, multiple word embedding models (GloVe, FastText, BERT, GPT-3, ConceptNet), alternate news corpora (2021–2023 word2vec), varied IP locations, controls for linguistic/category features, search frequency, number of images/faces, image ranking, and media type.

Discussion

The study directly addresses whether the rise of online images amplifies gender bias compared to text. Despite correlated gender associations across modalities, images exhibit systematically stronger male- and female-typed biases and a greater overall underrepresentation of women than texts, public opinion, and census benchmarks. Experimentally, exposure to image-based search results both increased the strength of explicit gendered beliefs and, to a suggestive extent, amplified implicit gender-science associations relative to text and control conditions, consistent with psychological mechanisms such as the picture superiority effect and higher salience of demographic cues in images. These results indicate that images do not merely reflect societal distributions; they distort them towards stronger gender typing, with potential social consequences for both women (especially in male-typed domains) and men in female-typed roles. The findings are particularly salient given the growth of image-centric platforms and the integration of images in search interfaces, as well as the proliferation of text-to-image AI models trained on web images that may inherit and propagate these biases. Understanding how content sources (blogs, business/news, stock photography) and human preferences for prototypical representations contribute to amplification can inform interventions in platform design, content moderation, and AI training pipelines.

Conclusion

Online images amplify gender bias compared to text in both prevalence and psychological impact. Large-scale multimodal analyses and a preregistered experiment show that images overrepresent male associations relative to texts, public opinion and census data, and that viewing images increases explicit—and suggestively implicit—gender biases. With the internet’s increasing visual orientation and the rise of image-generative AI, these effects may further entrench social stereotypes. Future research should investigate social and algorithmic drivers of image-based bias across gender, race and other demographics; analyze additional modalities (audio/video); compare human- and AI-generated content; and develop multimodal computational social science frameworks to monitor and mitigate bias. Addressing the societal impact of visual culture is essential to foster a fair and inclusive internet.

Limitations

Implicit bias results are treated as suggestive amid ongoing debates about IAT reliability; explicit bias is emphasized as the primary, robust outcome. Gender in images reflects perceived gender by annotators rather than self-identification; non-binary labels (≈2%) were excluded from main analyses. Annotators were US-based, English-fluent MTurk workers, which may limit generalizability across cultures. Image data were collected in August 2020 and may not capture temporal changes. Some occupational and category mappings rely on WordNet and available embeddings (2,986/3,495 categories matched), and the census comparison is limited to 685 occupations. Despite extensive robustness checks, residual confounding from content sources, search algorithms, and user behavior may remain.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny