logo
ResearchBunny Logo
Online images amplify gender bias

Sociology

Online images amplify gender bias

D. Guilbeault, S. Delecourt, et al.

This study by Douglas Guilbeault, Solène Delecourt, Tasker Hull, Bhargav Srinivasa Desikan, Mark Chu, and Ethan Nadler explores how online images contribute to the spread of gender bias. With a comprehensive analysis of over one million images, the findings reveal a troubling trend: gender bias is much more pronounced in visual media than in text. This crucial research highlights the urgent need to confront the societal impact of visual communication.... show more
Introduction

The study investigates how the rapid shift from text to images in online communication affects societal gender bias. With images dominating search engines, social media, news, and advertising—and with people spending less time reading and more time viewing images—the authors ask whether visual content amplifies gender bias compared to text. Psychological research suggests images are processed more rapidly, memorably, and emotionally, and make demographic cues (such as gender) more salient than text, which can use gender-neutral phrasing. This motivates the hypothesis that, relative to text, online images both exhibit stronger statistical gender bias and more powerfully shape users’ gendered beliefs.

Literature Review

Prior quantitative work on online gender bias has largely examined text corpora, including large-scale word embedding analyses that reveal human-like or historical gender biases in language. Only a few studies have assessed gender bias in Google Images, typically with small samples of occupations and a few thousand images, and without systematic comparisons between images and text or tests of psychological impact. Psychological literature documents the picture superiority effect—images are more memorable and emotionally evocative than text—and emphasizes that images can underlie text comprehension. Images also convey demographic cues (including gender) more saliently than text, reducing the ability to avoid gendered interpretations through neutral language. These strands of research imply that images are a potent vector for transmitting and reinforcing gender stereotypes online.

Methodology

Multimodal observational analysis and preregistered experiments were conducted.

  • Image data: For 3,495 social categories from WordNet (occupations and social roles), the top 100 Google Image results per category were collected via fresh accounts with no history using ten servers in New York City in August 2020 (total 349,500 images). Replications used additional gender-specific searches (adding 491,169 images), and IPs from Amsterdam, Bangalore, Frankfurt, Singapore, and Toronto.
  • Image annotation: 6,392 US-based, fluent-English MTurk coders labeled perceived gender of faces in images as female, male, or non-binary (modal label across three coders; 2% non-binary judgments excluded). Unanimous agreement occurred for 91% of images; Gwet’s AC = 0.48 indicated satisfactory reliability. An image-based gender association score was computed per category: normalized to −1 (100% female) to 1 (100% male), 0 = 50/50.
  • Text data: Gender association in text was measured using word embeddings, primarily word2vec trained on the 2013 Google News corpus (>100B words). A gender dimension captured co-occurrence with gendered terms, placing each category on a −1 (female) to 1 (male) axis; min–max normalization aligned scales with image-based measures. Robustness checks used alternative embeddings (GloVe, BERT, FastText, ConceptNet, GPT‑3), and a custom 2021–2023 news word2vec model.
  • Scope alignment: 2,986 categories matched between image and text measures for primary comparisons (image results robust with all 3,495 categories).
  • Statistical comparisons: Three dimensions of bias were assessed—(1) strength of gender association per category, (2) overall representation balance of women vs men across categories, and (3) correspondence with public opinion and 2019 US Bureau of Labor Statistics occupational gender distributions (n = 685 occupations matched).
  • Public opinion: 2,500 MTurk coders rated each category on the same −1 to 1 scale.
  • Experiment (explicit and implicit bias): Preregistered, nationally representative US sample from Prolific (n = 450 randomized; 423 completed). Participants were assigned to: Image condition (Google Images), Text condition (Google News), or Control condition (unrelated categories) and asked to search and upload descriptions (images or text) for 22 randomly selected occupations from a set of 54, then rate the gender they associate with each occupation on a −1 to 1 scale. Uploaded materials were annotated for gender (image focal face or text pronouns/names). Implicit Association Test (IAT) assessing associations of women with liberal arts and men with science was administered immediately post-task and 3 days later; D scores computed. An additional preregistered study and a variation using generic Google search for text confirmed robustness.
  • Robustness controls: Results held after controlling for linguistic features (ambiguity, word frequency, gendered terms), search frequency, number of faces/images, image ranking, cropping, duplication, presence of animated vs photographed people, and coder demographics/intercoder agreement.
Key Findings
  • Cross-modal alignment but stronger image bias: Category-level gender associations in Google Images and Google News were correlated (r = 0.5, P < 0.0001; n = 2,986) but significantly more extreme in images for both female- and male-typed categories (P < 0.0001, Wilcoxon signed-rank).
  • Underrepresentation of women is greater in images: Average male bias μ in text (Google News) = 0.03 (P < 0.0001) vs images (Google Images) = 0.14 (P < 0.0001); mean difference = 0.11 (P < 0.0001); 56% of categories male-skewed in text vs 62% in images (P < 0.0001). Deep learning gender classification corroborated stronger inequality in images.
  • Comparison with public opinion: Texts significantly underestimated male bias in public opinion by −0.084 on average (P < 0.001), whereas images significantly overestimated it by 0.025 on average (P < 0.001) (n = 2,986).
  • Comparison with US Census (2019 BLS; n = 685 occupations): Text-based associations were neutral (μ = 0, P = 0.65) and significantly less male-skewed than census (μ = 0.08, P < 0.001) and images (μ = 0.15, P < 0.001). Images were significantly more male-skewed than census for the same occupations (mean difference = 0.07, P < 0.001).
  • Experiment—uploads more gendered in Image condition: Participants’ uploaded descriptions were more gendered in Image vs Text condition (mean difference = 0.42, P < 0.0001).
  • Experiment—explicit bias amplified by images: Participants exposed to images reported stronger explicit gender associations than Text (mean difference = 0.06, P < 0.001) and Control (mean difference = 0.06, P < 0.001); no difference between Text and Control (P = 0.56). Strong correlations between uploaded content and explicit ratings across occupations: r = 0.79 for directionality and r = 0.56 for absolute strength (P < 0.0001).
  • Images prime stronger explicit bias holding prevalence constant: When comparing equally gendered materials, those exposed to images reported stronger bias (μ = 0.41) than those exposed to text (μ = 0.35); mean difference = 0.06 (P < 0.0001; t = 4.58).
  • Implicit bias: All conditions showed significant implicit bias (D > 0, P < 0.0001). Image condition had stronger implicit bias than Control (mean difference = 0.11, P = 0.005). Image vs Text difference was not conventionally significant (mean difference = 0.05, P = 0.09). Strong positive relationship between explicit ratings and IAT D scores (P < 0.0001), with greater values in Image condition; only Image condition showed significantly stronger implicit bias than Control 3 days later, indicating persistence.
Discussion

The results support the hypothesis that online images amplify gender bias relative to text in both statistical prevalence and psychological impact. Image-based representations of social categories are more gender-skewed than text-based representations, underrepresenting women beyond levels observed in public opinion and occupational distributions in US census data. Experimentally, exposure to images increases the strength of explicit gender associations, and even when controlling for prevalence of gendered content, images more strongly prime gendered beliefs than text. There is suggestive evidence that images may also heighten implicit gender bias, including enduring effects after several days. Together, these findings indicate that the internet’s shift toward visual content is likely to exacerbate gender stereotypes, shaping public perceptions of occupations and social roles with downstream implications for inequality and representation. The work highlights the need for interventions in image-centric platforms and for scrutiny of multimodal AI systems trained on image-rich web data that inherit and potentially magnify these biases.

Conclusion

Gender bias online is more prevalent and psychologically potent in images than in text. As digital culture becomes increasingly image-centric—across search engines, social media, and AI-generated imagery—gender stereotypes are likely to be amplified, intensifying underrepresentation of women (and of men in female-typed roles) and entrenching biased beliefs. Future research should investigate social and algorithmic mechanisms of bias in images across gender, race, and other demographics; examine source contributions (e.g., blogs, news, stock photos, celebrity media); and extend multimodal analyses to audio and video, including comparisons of human- and AI-generated content. Developing multimodal frameworks in computational social science and addressing the societal impacts of visual culture are essential for fostering a fair and inclusive internet.

Limitations
  • Gender perception vs self-identification: Image gender labels reflect perceived gender by annotators; self-identified gender was not the primary focus (non-binary labels (≈2%) were excluded). A replication with celebrity datasets used self-identified gender but pertains to a specific subpopulation.
  • Annotator/sample scope: MTurk annotators were US-based fluent English speakers; generalizability to other cultural contexts may be limited. Experimental participants were US-based; findings may vary internationally.
  • Platform and time constraints: Image data primarily from Google Images (August 2020) and text mainly from Google News (2013 corpus; with robustness checks). Patterns may evolve over time and across platforms.
  • Measurement choices: Word-embedding-based gender dimensions and normalization decisions could influence text-based estimates, though extensive robustness checks were performed. Intercoder reliability, while satisfactory (Gwet’s AC = 0.48), is imperfect.
  • Exclusion/edge cases: Images without discernible faces or with animated figures required coding decisions; some categories may be ambiguous or carry inherent gender terms. Only occupations matched to census data were compared. Implicit bias results are presented as suggestive, with some comparisons not reaching conventional significance and stability varying across preregistered studies.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny